Note: It is very common for a WHERE clause to accidentally eliminate rows in
an OUTER JOIN. Typically, a LEFT OUTER JOIN or RIGHT OUTER JOIN becomes
an INNER JOIN, or a FULL OUTER JOIN becomes a LEFT or RIGHT OUTER
JOIN. Here’s the technical explanation for this symptom: Any null-intolerant
predicate that refers to attributes from a null-supplying table will eliminate
NULL-supplied rows from the result. A null-intolerant predicate is a predicate
that cannot evaluate to true if any of its inputs are NULL. Most SQL predicates,
such as comparisons, LIKE, or IN predicates, are null-intolerant. Examples of
null-tolerant predicates are IS NULL and any predicate p qualified by a
null-tolerant truth value test, such as p IS NOT TRUE. (from “Semantics and
Compatibility of Transact-SQL Outer Joins” by G. N. Paulley, 15 February 2002,
iAnywhere Solutions Technical White Paper, Document Number 1017447.)
3.6 Multi-Table Joins
The syntax of the FROM clause allows for joins among endless numbers of
tables, with or without parentheses to create nested table expressions, and with
or without ON conditions on each join. In most cases, parentheses are not
required, but it is a very good idea to provide an ON condition for every join
operator whenever possible.
<table_expression> ::= <table_term>
| <table_expression>
CROSS JOIN
<table_term>
| <table_expression>
[ <on_condition_shorthand> ] do not use
<join_operator>
<table_term>
[ <on_condition> ] use this instead
<table_term> ::= <table_reference>
| <view_reference>
| <derived_table>
| <procedure_reference>
| "(" <table_expression_list> ")"
| <lateral_derived_table>
<on_condition_shorthand> ::= KEY foreign key columns; do not use
| NATURAL like-named columns; do not use
<join_operator> ::= <inner_join>
| <left_outer_join>
| <right_outer_join>
| <full_outer_join>
In the absence of parentheses, join operators are evaluated from left to right.
That means the first pair of table terms are joined to create a virtual table, then
that virtual table is joined to the third table term to produce another virtual table,
and so on.
The following example shows a four-way join among tables that exist in the
ASADEMO database that ships with SQL Anywhere Studio 9. Here is the
schema for the four tables (customer, product, sales_order, and
sales_order_items) plus two other tables that will appear in later examples
(employee and fin_code):
CREATE TABLE customer (
id INTEGER NOT NULL DEFAULT AUTOINCREMENT,
fname CHAR ( 15 ) NOT NULL,
lname CHAR ( 20 ) NOT NULL,
90 Chapter 3: Selecting
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
address CHAR ( 35 ) NOT NULL,
city CHAR ( 20 ) NOT NULL,
state CHAR ( 16 ) NULL,
zip CHAR ( 10 ) NULL,
phone CHAR ( 12 ) NOT NULL,
company_name CHAR ( 35 ) NULL,
PRIMARY KEY ( id ) );
CREATE TABLE employee (
emp_id INTEGER NOT NULL PRIMARY KEY,
manager_id INTEGER NULL,
emp_fname CHAR ( 20 ) NOT NULL,
emp_lname CHAR ( 20 ) NOT NULL,
dept_id INTEGER NOT NULL,
street CHAR ( 40 ) NOT NULL,
city CHAR ( 20 ) NOT NULL,
state CHAR ( 16 ) NULL,
zip_code CHAR ( 10 ) NULL,
phone CHAR ( 10 ) NULL,
status CHAR(2)NULL,
ss_number CHAR ( 11 ) NULL,
salary NUMERIC ( 20, 3 ) NOT NULL,
start_date DATE NOT NULL,
termination_date DATE NULL,
birth_date DATE NULL,
bene_health_ins CHAR(2)NULL,
bene_life_ins CHAR(2)NULL,
bene_day_care CHAR(2)NULL,
sex CHAR(2)NULL );
CREATE TABLE fin_code (
code CHAR(2)NOTNULL PRIMARY KEY,
type CHAR ( 10 ) NOT NULL,
description CHAR ( 50 ) NULL );
CREATE TABLE product (
id INTEGER NOT NULL,
name CHAR ( 15 ) NOT NULL,
description CHAR ( 30 ) NOT NULL,
size CHAR ( 18 ) NOT NULL,
color CHAR(6)NOTNULL,
quantity INTEGER NOT NULL,
unit_price NUMERIC ( 15, 2 ) NOT NULL,
PRIMARY KEY ( id ) );
CREATE TABLE sales_order (
id INTEGER NOT NULL DEFAULT AUTOINCREMENT,
cust_id INTEGER NOT NULL REFERENCES customer ( id ),
order_date DATE NOT NULL,
fin_code_id CHAR(2)NULL REFERENCES fin_code ( code ),
region CHAR(7)NULL,
sales_rep INTEGER NOT NULL REFERENCES employee ( emp_id ),
PRIMARY KEY ( id ) );
CREATE TABLE sales_order_items (
id INTEGER NOT NULL REFERENCES sales_order ( id ),
line_id SMALLINT NOT NULL,
prod_id INTEGER NOT NULL REFERENCES product ( id ),
quantity INTEGER NOT NULL,
ship_date DATE NOT NULL,
PRIMARY KEY ( id, line_id ) );
Chapter 3: Selecting
91
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
The customer table holds information about companies that may buy products,
the product table defines each product for sale, sales_order records each sale to
a customer, and the sales_order_items table is a many-to-many relationship
between product and sales_order to record which products were included in
which orders. There are foreign key relationships among these tables to define
the relationships, and these foreign key relationships are used in the ON condi
-
tions of the four INNER JOIN operations, which gather all the information
about which products were sold to which customers as part of which order:
SELECT customer.company_name,
sales_order.order_date,
product.name,
product.description,
sales_order_items.quantity,
product.unit_price * sales_order_items.quantity AS amount
FROM customer
INNER JOIN sales_order
ON sales_order.cust_id = customer.id
INNER JOIN sales_order_items
ON sales_order_items.id = sales_order.id
INNER JOIN product
ON product.id = sales_order_items.prod_id
ORDER BY customer.company_name,
sales_order.order_date,
product.name;
Here’s how this FROM clause works from a logical point of view:
n
First, rows in customer are joined with rows in sales_order where the cus-
tomer id columns match. The virtual table resulting from the first INNER
JOIN contains all the columns from the customer and sales_order tables.
n
In the second INNER JOIN, the rows from the first virtual table are joined
with rows in sales_order_item where the sales order id columns match.
Note that the columns in the first virtual table may be referred to using their
base table name; e.g., sales_order.order_id in the second ON condition. The
result of the second INNER JOIN is a new virtual table consisting of all the
columns in customer, sales_order, and sales_order_item.
n
In the final INNER JOIN, the rows from the second virtual table are joined
with rows in product where product id columns match. The result of the
final INNER JOIN is a virtual table consisting of columns in all four tables.
Even though this is (conceptually speaking) a single virtual table, individ
-
ual columns may still be referred to using their original table names; e.g.,
customer.company_name in the ORDER BY clause.
The final result set consists of 1,097 rows. Here are the first six rows, showing
the detail of the first three orders placed by Able Inc.:
company_name order_date name description quantity amount
============ ========== ============ ================= ======== ======
Able Inc. 2000-01-16 Sweatshirt Hooded Sweatshirt 36 864.00
Able Inc. 2000-01-16 Sweatshirt Zipped Sweatshirt 36 864.00
Able Inc. 2000-03-20 Baseball Cap Wool cap 24 240.00
Able Inc. 2000-04-08 Baseball Cap Cotton Cap 24 216.00
Able Inc. 2000-04-08 Baseball Cap Wool cap 24 240.00
Able Inc. 2000-04-08 Visor Cloth Visor 24 168.00
Each ON condition applies to the preceding join operator. The following FROM
clause uses parentheses to explicitly show which ON goes with which INNER
92 Chapter 3: Selecting
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
JOIN in the preceding example; note that this particular FROM clause performs
exactly the same function with or without the parentheses:
FROM(((customer
INNER JOIN sales_order
ON sales_order.cust_id = customer.id )
INNER JOIN sales_order_items
ON sales_order_items.id = sales_order.id )
INNER JOIN product
ON product.id = sales_order_items.prod_id )
Parentheses are useful in arithmetic expressions when you have to override the
natural order of execution of the different operators (e.g., if you want addition to
come before multiplication). Even if they’re not required, parentheses in arith
-
metic expressions help the reader understand the order of evaluation. Those
arguments do not apply as strongly to parentheses in the FROM clause. First of
all, there is no difference in precedence among the different join operators like
INNER JOIN and LEFT OUTER JOIN; without parentheses they’re simply
evaluated from left to right. Also, FROM clauses tend to be long, drawn-out
affairs where matching parentheses appear far apart, so they’re not much help to
the reader. Even in the simple example above, it’s hard to see what the parenthe-
ses are doing; an argument can be made that the version without parentheses is
easier to read.
Having said that, parentheses in the FROM clause are sometimes necessary
and helpful. The following example illustrates that point using the four tables in
the ASADEMO database discussed above: customer, product, sales_order, and
sales_order_items. The requirement is to show how many of each kind of shirt
were sold to each customer in Washington, D.C., including combinations of
product and customer that had no sales. In other words, show all the combina-
tions of Washington customers and shirt products, whether or not any actual
sales were made.
At first glance it appears four joins are required: a CROSS JOIN between
customer and product to generate all possible combinations, a LEFT OUTER
JOIN between customer and sales_order to include customers whether or not
they bought anything, a LEFT OUTER JOIN between product and
sales_order_items to include products whether or not any were sold, and an
INNER JOIN between sales_order and sales_order_items to match up the orders
with their order items.
Perhaps it is possible to write these four joins, in the right order, with or
without parentheses, but a simpler solution uses a divide-and-conquer approach:
n
First, separately and independently compute two different virtual tables: the
CROSS JOIN between customer and product, and the INNER JOIN
between sales_order and sales_order_items.
n
Second, perform a LEFT OUTER JOIN between the first and second vir
-
tual tables. Parentheses are used to separate the first step from the second.
Here is the pseudocode for the FROM clause using this approach:
SELECT
FROM ( all the combinations of customer and product )
LEFT OUTER JOIN
( all the matching combinations of sales_order and sales_order_items )
WHERE
Chapter 3: Selecting
93
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
The full SELECT is shown below; the FROM clause has only three joins, two
of them nested inside parentheses to create two simple virtual tables. The final
LEFT OUTER JOIN combines these two virtual tables using an ON clause that
refers to all four base tables inside the two virtual tables. The parentheses make
it easy to understand: The CROSS JOIN is the simplest kind of join there is, and
the INNER join is a simple combination of sales_order rows with their associ
-
ated sales_order_items row.
SELECT customer.company_name AS company_name,
product.name AS product_name,
product.description AS product_description,
SUM ( sales_order_items.quantity ) AS quantity,
SUM ( product.unit_price
* sales_order_items.quantity ) AS amount
FROM ( customer
CROSS JOIN product )
LEFT OUTER JOIN
( sales_order
INNER JOIN sales_order_items
ON sales_order_items.id = sales_order.id )
ON customer.id = sales_order.cust_id
AND product.id = sales_order_items.prod_id
WHERE customer.state = 'DC'
AND product.name LIKE '%shirt%'
GROUP BY customer.company_name,
product.name,
product.description
ORDER BY customer.company_name,
product.name,
product.description;
The final result is shown below. There are two customers in Washington, D.C.,
and five different kinds of shirts for sale, making for 10 combinations of cus-
tomer and product. Five combinations had no sales as shown by the NULL
values in quantity and amount, and five combinations did have actual sales.
company_name product_name product_description quantity amount
======================= ============ =================== ======== =======
Hometown Tee's Sweatshirt Hooded Sweatshirt 24 576.00
Hometown Tee's Sweatshirt Zipped Sweatshirt NULL NULL
Hometown Tee's Tee Shirt Crew Neck NULL NULL
Hometown Tee's Tee Shirt Tank Top 24 216.00
Hometown Tee's Tee Shirt V-neck NULL NULL
State House Active Wear Sweatshirt Hooded Sweatshirt 48 1152.00
State House Active Wear Sweatshirt Zipped Sweatshirt 48 1152.00
State House Active Wear Tee Shirt Crew Neck NULL NULL
State House Active Wear Tee Shirt Tank Top NULL NULL
State House Active Wear Tee Shirt V-neck 60 840.00
A star join is a multi-table join between one single “fact table” and several
“dimension tables.” Pictorially, the fact table is at the center of a star, and the
dimension tables are the points of the star, arranged around the central fact
table.
The fact table stores a large number of rows, each containing a single fact;
for example, in the ASADEMO database the sales_order table contains over
600 rows, each containing the record of a single sale. The dimension tables
store information about attributes of those facts; for example, the customer table
contains the name and address of the customer who made the purchase.
94 Chapter 3: Selecting
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Each dimension table is related to the fact table by a foreign key relation
-
ship, with the fact table as the child and the dimension table as the parent. For
example, the sales_order table has foreign key relationships with three dimen
-
sion tables: customer, employee, and fin_code. The employee table contains
more information about the salesperson who took the order, and the fin_code
table has more information about the financial accounting code for the order.
Dimension tables are usually much smaller than the fact table; in the
ASADEMO database there are three times as many rows in the sales_order fact
table than there are in all three dimension tables put together. Dimension tables
also tend to be highly normalized; for example, each customer’s name and
address is stored in one row in the customer table rather than being repeated in
multiple sales_order rows. Star joins are used to denormalize the tables in the
star by gathering data from all of them and presenting it as a single result set.
For more information about normalization, see Section 1.16, “Normalized
Design.”
A star join may be represented as a FROM clause where the fact table
appears first, followed by a series of INNER JOIN operators involving the
dimension tables. The ON clauses on all the joins refer back to the first table,
the fact table. Following is an example that selects all the sales orders in a date
range, together with information from the customer, employee, and fin_code
tables; the sales_order table is the central fact table in this star join.
SELECT sales_order.order_date AS order_date,
sales_order.id AS order_id,
customer.company_name AS customer_name,
STRING ( employee.emp_fname,
'',
employee.emp_lname ) AS rep_name,
fin_code.description AS fin_code
FROM sales_order
INNER JOIN customer
ON sales_order.cust_id = customer.id
INNER JOIN employee
ON sales_order.sales_rep = employee.emp_id
INNER JOIN fin_code
ON sales_order.fin_code_id = fin_code.code
WHERE sales_order.order_date BETWEEN '2000-01-02' AND '2000-01-06'
ORDER BY order_date,
order_id;
Here is the result of the star join, which effectively “denormalizes” four tables
into a single result set:
order_date order_id customer_name rep_name fin_code
========== ======== ===================== =============== ========
2000-01-02 2131 BoSox Club Samuel Singer Fees
2000-01-03 2065 Bloomfields Samuel Singer Fees
2000-01-03 2126 Leisure Time Rollin Overbey Fees
2000-01-06 2127 Creative Customs Inc. James Klobucher Fees
2000-01-06 2135 East Coast Traders Alison Clark Fees
Chapter 3: Selecting
95
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
3.7 SELECT FROM Procedure Call
A SQL Anywhere stored procedure can return a result set, and that result set can
be treated just like a table in a FROM clause.
<procedure_reference> ::= [ <owner_name> "." ] <procedure_name>
"(" [ <argument_list> ] ")"
[ WITH "(" <result_definition_list> ")" ]
[ [ AS ] <correlation_name> ]
<procedure_name> ::= <identifier>
<argument_list> ::= <argument> { "," <argument> }
<argument> ::= <basic_expression>
| <parameter_name> "=" <basic_expression>
<parameter_name> ::= see <parameter_name> in Chapter 8, “Packaging”
<result_definition_list> ::= <result_definition> { "," <result_definition> }
<result_definition> ::= <alias_name> <data_type>
<data_type> ::= see <data_type> in Chapter 1, “Creating”
The advantage to using a stored procedure is that it can contain multiple state
-
ments whereas derived tables and views must be coded as a single query.
Sometimes a difficult problem is made easier by breaking it into separate steps.
For example, consider this convoluted request: Show all the products that con-
tributed to the second- and third-best sales for a single color on a single day in
the worst year for sales, using three of the ASADEMO database tables
described in the previous section — product, sales_order, and
sales_order_items.
A divide-and-conquer approach can be used to solve this problem:
n
First, compute the worst year for total sales.
n
Second, within that year, find the second- and third-best sales for a single
color on a single day.
n
Third, for those combinations of best color and order date, find the match-
ing products; in other words, find the products with matching colors that
were ordered on those dates.
Each of these steps has its challenges, but solving them separately is a lot easier
than writing one single select to solve them all at once. And even if you could
write one query to do everything, other people might have a lot of trouble
understanding what you’ve written, and in some shops maintainability is more
important than elegance.
A stored procedure called p_best_losers_in_worst_year performs the first
two steps: One SELECT computes the total sales for each year, sorts the results
in ascending order by sales amount, and takes the first year and stores it in a
local variable called @worst_year. A second SELECT computes the total sales
by color and date within @worst_year, sorts the results in descending order by
sales amount, and returns the second and third rows (the “best losers”) as the
procedure result set.
The following shows what the procedure looks like. For more information
about the CREATE PROCEDURE statement, see Section 8.9.
CREATE PROCEDURE p_best_losers_in_worst_year()
BEGIN
DECLARE @worst_year INTEGER;
96 Chapter 3: Selecting
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Determine the worst year for total sales.
SELECT FIRST
YEAR ( sales_order.order_date )
INTO @worst_year
FROM product
INNER JOIN sales_order_items
ON product.id = sales_order_items.prod_id
INNER JOIN sales_order
ON sales_order_items.id = sales_order.id
GROUP BY YEAR ( sales_order.order_date )
ORDER BY SUM ( sales_order_items.quantity * product.unit_price ) ASC;
Find the second- and third-best sales for a single color on a
single day in the worst year.
SELECT TOP 2 START AT 2
product.color AS best_color,
sales_order.order_date AS best_day,
SUM ( sales_order_items.quantity * product.unit_price ) AS sales_amount,
NUMBER(*) + 1 AS rank
FROM product
INNER JOIN sales_order_items
ON product.id = sales_order_items.prod_id
INNER JOIN sales_order
ON sales_order_items.id = sales_order.id
WHERE YEAR ( sales_order.order_date ) = @worst_year
GROUP BY product.color,
sales_order.order_date
ORDER BY SUM ( sales_order_items.quantity * product.unit_price ) DESC;
END;
The first SELECT in the procedure puts a single value into the variable
@worst_year. The second query doesn’t have an INTO clause, so its result set is
implicitly returned to the caller when the procedure is called.
You can test this procedure in ISQL as follows:
CALL p_best_losers_in_worst_year();
Here are the second- and third-best color days, together with the sales amounts,
as returned by the procedure call:
best_color best_day sales_amount rank
========== ========== ============ ====
Green 2001-03-24 1728.00 2
Black 2001-03-17 1524.00 3
The third step in the solution uses the procedure call as a table term in the
FROM clause of a query to find the product details:
SELECT DISTINCT
product.id,
product.name,
product.description,
product.color,
best_loser.rank
FROM p_best_losers_in_worst_year() AS best_loser
INNER JOIN product
ON product.color = best_loser.best_color
INNER JOIN sales_order_items
ON product.id = sales_order_items.prod_id
INNER JOIN sales_order
ON sales_order_items.id = sales_order.id
Chapter 3: Selecting
97
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
AND sales_order.order_date = best_loser.best_day
ORDER BY best_loser.rank ASC,
product.id ASC;
Here’s how that SELECT works:
n
The procedure reference p_best_losers_in_worst_year() is coded without
the CALL keyword but with an empty argument list; those are the mini
-
mum requirements for a procedure call in a FROM clause.
n
A correlation name, “best_loser,” is defined, but isn’t necessary; if you
don’t specify an explicit correlation name, the procedure name itself will be
used as the correlation name in the rest of the query.
n
The FROM clause then uses INNER JOIN operators to join rows in
best_loser together with rows in the other three tables — product,
sales_order_items, and sales_order — to find the combinations that match
on color and order date.
n
Finally, the select list returns columns from product plus the rank (second
or third) from best_loser. The DISTINCT keyword is used because the
same product may have been included in more than one sales order on the
same day, and we’re only interested in seeing each different product.
Here is the final result, which shows that one green product contributed to the
second-best day, and three black products contributed to the third-best day:
id name description color rank
=== ============ ================= ===== ====
600 Sweatshirt Hooded Sweatshirt Green 2
302 Tee Shirt Crew Neck Black 3
400 Baseball Cap Cotton Cap Black 3
700 Shorts Cotton Shorts Black 3
A stored procedure can specify column names for its result set in one of two
ways: by making sure each item in the select list has a column name or an alias
name, or by specifying an explicit RESULT clause in the CREATE
PROCEDURE statement. Both of those methods are optional, however, and that
can cause problems for a stored procedure reference in a FROM clause. For
example, if the expression NUMBER(*) + 1 didn’t have the alias name “rank”
explicitly specified in the procedure p_best_losers_in_worst_year presented
above, the reference to best_loser.rank couldn’t be used in the final select list.
Another solution is to add an explicit WITH list to the procedure reference
in the FROM clause. This WITH list specifies the alias names and data types to
be used for each column in the procedure result set, as far as this FROM clause
is concerned. Even if the stored procedure specifies names for the columns in
its result set, the WITH list names override those. Here is the above SELECT
with an explicit WITH list that specifies two alias names that are different from
the names the procedure returns:
SELECT DISTINCT
product.id,
product.name,
product.description,
product.color,
best_loser.ranking
FROM p_best_losers_in_worst_year()
WITH ( best_color VARCHAR(6),
best_day DATE,
98 Chapter 3: Selecting
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
best_sales NUMERIC ( 15, 2 ),
ranking INTEGER )
AS best_loser
INNER JOIN product
ON product.color = best_loser.best_color
INNER JOIN sales_order_items
ON product.id = sales_order_items.prod_id
INNER JOIN sales_order
ON sales_order_items.id = sales_order.id
AND sales_order.order_date = best_loser.best_day
ORDER BY best_loser.ranking ASC,
product.id ASC;
A procedure reference in a FROM clause is executed exactly once, and the
result set is materialized exactly once, if that procedure has an empty argument
list or only receives constant arguments. This can be bad news or good news
depending on your needs. If the procedure returns a lot of unnecessary rows, the
query processor won’t optimize the call and performance may be worse for a
procedure reference than, say, for the equivalent view reference or derived table
if one could be defined. On the other hand, knowing that the procedure will def
-
initely be called exactly once, and the result set materialized, may help you
solve some tricky problems.
In this discussion, materialized means the result set is fully evaluated and
stored in memory or in the temporary file if memory is exhausted. Also, con-
stant argument means an argument that doesn’t change in value while the
FROM clause is evaluated; literals fall into that category, as do program vari-
ables, and expressions involving literals and variables, but not references to
columns in other tables in the FROM clause.
The next section talks about a procedure that receives a variable argument;
i.e., a column from another table in the FROM clause.
3.8 LATERAL Procedure Call
If a column from another table is passed as an argument to a procedure refer
-
ence in a FROM clause, that procedure reference must appear as part of a
LATERAL derived table definition. Also, the other table must appear ahead of
the LATERAL derived table definition and be separated from it by a comma
rather than one of the join operators like INNER JOIN. This is a situation where
the “comma join operator” must be used and the ON condition cannot be used.
Here is the general syntax for a LATERAL derived table:
<lateral_derived_table> ::= LATERAL
<subquery>
[ AS ] <correlation_name>
[ <derived_column_name_list> ]
| LATERAL
"(" <table_expression> ")"
[ AS ] <correlation_name>
[ <derived_column_name_list> ]
Here is the simplified syntax for a join between a table and a procedure refer
-
ence where a column from that table is passed as an argument; this is the only
use of the comma join and the LATERAL keyword that is discussed in this
book:
Chapter 3: Selecting
99
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Không có nhận xét nào:
Đăng nhận xét