There are many trees out there, but what is your favorite join-tree?
Join trees are a graphical representation of the way a series of joins (an N-way join as it is called scientifically) is executed. The most common form of join trees are the left-deep join trees. Actually, these are the only ones that the CBO (i.e., the Oracle Cost Based Optimizer) considers when it chooses a join method other than a Hash Join (i.e., Nested Loops, Merge Join and Cartesian Joins). For Hash Joins in particular, the CBO also considers right-deep join trees from 11g and onward.
In this post, we will describe why right-deep join trees are important for the execution of star schema queries, which are the bread and butter of Data Warehouses. We will show that right-deep join trees make a more efficient use of the workareas (i.e., the memory area used in the PGA, when a hash join, group by, sorting etc operation is executed) during the execution of a series of hash-joins included in a typical star schema query.
More information about join-trees can be found also in the excellent books "Troubleshooting Oracle Performance" by C. Antognini and "Expert Oracle SQL" by Tony Hasler. Also, very informative are the following two blog posts : a) by Kyle Hailey and b) by Tony Hasler.
OK. Lets see what we want to do: We have a typical star schema query (from this point onward will be called a "star query") and we want to see the difference between executing it with a right-deep join tree versus a left-deep join tree.
Fortunately from 11g the CBO can recognize the efficiency of right-deep joint trees for this type of queries and chooses the correct join-tree. So for our example we will use hints in order to force a left-deep join tree. Note that, as we have stated above, for other join methods (nested loops, merge join, cartesian join) only left-deep join trees are considered by the CBO.
Join trees are a graphical representation of the way a series of joins (an N-way join as it is called scientifically) is executed. The most common form of join trees are the left-deep join trees. Actually, these are the only ones that the CBO (i.e., the Oracle Cost Based Optimizer) considers when it chooses a join method other than a Hash Join (i.e., Nested Loops, Merge Join and Cartesian Joins). For Hash Joins in particular, the CBO also considers right-deep join trees from 11g and onward.
In this post, we will describe why right-deep join trees are important for the execution of star schema queries, which are the bread and butter of Data Warehouses. We will show that right-deep join trees make a more efficient use of the workareas (i.e., the memory area used in the PGA, when a hash join, group by, sorting etc operation is executed) during the execution of a series of hash-joins included in a typical star schema query.
More information about join-trees can be found also in the excellent books "Troubleshooting Oracle Performance" by C. Antognini and "Expert Oracle SQL" by Tony Hasler. Also, very informative are the following two blog posts : a) by Kyle Hailey and b) by Tony Hasler.
OK. Lets see what we want to do: We have a typical star schema query (from this point onward will be called a "star query") and we want to see the difference between executing it with a right-deep join tree versus a left-deep join tree.
Fortunately from 11g the CBO can recognize the efficiency of right-deep joint trees for this type of queries and chooses the correct join-tree. So for our example we will use hints in order to force a left-deep join tree. Note that, as we have stated above, for other join methods (nested loops, merge join, cartesian join) only left-deep join trees are considered by the CBO.
Left-Deep Join Trees vs. Right-Deep Join Trees
Let’s see a left-deep
join tree:
Finally, if we want to force a right-deep join tree we can use the swap_join_inputs hint as follows:
select /*+ leading(t1
t2 t3 t4 t5)
use_hash(t2)
use_hash(t3)
use_hash(t4)
use_hash(t5)
use_hash(t2)
use_hash(t3)
use_hash(t4)
use_hash(t5)
no_swap_join_inputs(t2)
no_swap_join_inputs(t3)
no_swap_join_inputs(t4)
no_swap_join_inputs(t5)
*/ *
from t1
join t2 using(id)
join t3 using(id)
join t4 using(id)
join t5 using(id)
Listing 1: A join of 5 tables with hints to
ensure that no hash join input swapping takes place.
This is a left-deep
join tree:
(((t1 --> t2) --> t3) --> t4) --> t5
(((t1 --> t2) --> t3) --> t4) --> t5
(1st join) (2nd join) (3rd join) (4th join)
Figure 1: Left-Deep Join Tree
Plan hash value: 4183782856
--------------------------------------
| Id |
Operation | Name |
--------------------------------------
| 0 | SELECT
STATEMENT | |
|* 1 | HASH JOIN | |
|* 2 | HASH JOIN | |
|* 3 | HASH JOIN |
|
|* 4 | HASH JOIN | |
| 5 | TABLE ACCESS FULL| T1 |
| 6 | TABLE ACCESS FULL| T2 |
| 7 | TABLE ACCESS FULL | T3 |
| 8 | TABLE ACCESS FULL | T4
|
| 9 | TABLE ACCESS FULL | T5
|
--------------------------------------
Plan 1: Left-Deep Join Tree execution plan
In listing 1, we have a 4-way join query.We have used hints appropriate for ensuring that a left-deep join tree will be created (we will give more details for these hints in a while). Lets describe how the processing of this tree takes place (see figure 1 for the tree and plan 1 for the corresponding execution plan):
Now, lets see the same query executed with a right-deep join tree:
In listing 1, we have a 4-way join query.We have used hints appropriate for ensuring that a left-deep join tree will be created (we will give more details for these hints in a while). Lets describe how the processing of this tree takes place (see figure 1 for the tree and plan 1 for the corresponding execution plan):
- The first join to be executed is T1-->T2. A workarea is allocated in the PGA memory in order to hold the so-called "build table" (table T1 in our case - i.e., the 1st table in a hash-join) for the hash join. Lets call it "workarea A"
- Then, the so-called "Probe Table" (table T2 in our case - i.e., the 2nd table in a hash-join) is scanned and each row is hashed to the in-memory build table in workarea A, in order to find a match. Whenever a match is found the row is returned to the parent operation (operation 4)
- A second workarea (workarea B) is created in the PGA in order to create a new "build table" holding the results of the previous join (T1-->T2), in order to execute the next join, which is (T1-->T2)-->T3 . So now we have two workareas open: A and B.
- Table T3, which is the next "Probe Table", is scanned and each row is hashed to the in-memory "build table" for matching rows. The matched rows are returned to the parent operation (operation 3).
- Now for the next join, which is ((T1-->T2)-->T3)-->T4 we need a workarea in order to hold the "build table", which is the resulting rows from the previous join (i.e., (T1-->T2)-->T3)). Right now, workarea A holds the rows of T1 and workarea B holds the results of T1-->T2. So workarea A is available to be reused and store the rows of (T1-->T2)-->T3).
- Similarly, the rows of T4 are scanned and each row is hashed to the in-memory hash table in order to find a matching row. Matching rows are returned to operation 2.
- Next, it is time to reuse workarea B and build an in-memory hash table with the results of the previous join (i.e., ((T1-->T2)-->T3)-->T4). Now the final join which is (((T1-->T2)-->T3)-->T4)-->T5 can be executed.
- Oracle scans the rows of table T5 (the "probe table" of the last join) and try to find matching rows in workarea B. The matching rows are returned as the result of the query.
What is important for our discussion is that during the whole execution of the left-deep join tree no more than two workareas were necessary.
Now, lets see the same query executed with a right-deep join tree:
select /*+ leading(t1
t2 t3 t4 t5)
use_hash(t2)
use_hash(t3)
use_hash(t4)
use_hash(t5)
swap_join_inputs(t2)
swap_join_inputs(t3)
swap_join_inputs(t4)
swap_join_inputs(t5)
*/ *
from t1
join t2 using(id)
join t3 using(id)
join t4 using(id)
join t5 using(id)
Listing 2: A join of 5 tables with hints to ensure a right-depth join tree
T5--> (T4--> (T3--> (T1-->T2)))
(1st join) (2nd join) (3rd join) (4th join)
Figure 2: A Right-Deep Join Tree
Plan hash value: 1811737256
--------------------------------------
| Id |
Operation | Name |
--------------------------------------
| 0 | SELECT
STATEMENT | |
|* 1 | HASH JOIN | |
| 2 | TABLE ACCESS FULL | T5
|
|* 3 | HASH JOIN | |
| 4 | TABLE ACCESS FULL | T4
|
|* 5 | HASH JOIN |
|
| 6 | TABLE ACCESS FULL | T3 |
|* 7 | HASH JOIN |
|
| 8 | TABLE ACCESS FULL| T1 |
| 9 | TABLE ACCESS FULL| T2 |
--------------------------------------
Plan 2: Right-Deep Join Tree execution plan
In Figure 2 we can see the right-deep join tree and in Plan 2 the corresponding execution plan. Lets describe how the processing takes place.
In Figure 2 we can see the right-deep join tree and in Plan 2 the corresponding execution plan. Lets describe how the processing takes place.
- The first join is T5--> ( ... ), so Oracle initiates a workarea (A) for storing the rows of table T5 (the "build table" in this join). In order to scan the "probe table" (i.e., the 2nd table in this join) Oracle needs to execute the second join.
- The 2nd join is T4-->(...) and so Oracle creates one more workarea (B), in order to hash the rows of table T4. Once more the probe table of this join is the results of the next join (the 3rd in the row), which is T3-->(...)
- So, Oracle creates one more workarea (C) to hash the rows of T3 and proceeds in executing the next (and final) join (T1-->T2), which plays the role of the "probe table" for the previous join.
- As you might expect, in order to execute this final join we need yet another one workarea (D) for hashing the rows of T1.
From the above process, it is now crystal clear that when executing a right-deep join tree with N joins, then we need concurrently N workareas allocated in the PGA. Compare this to our previous example where we have showed that a left-deep join trees never requires more that two workareas allocated and you will come to the result that right-deep join trees are more wasteful in memory consumption. Right?... Wrong!
In a Data Warehouse environment where star schemas are prevalent and star queries are the norm, a right-deep join tree might be more memory efficient.
What is special about Star Schema Queries' Joins
A star query includes always a series of joins (an N-way join) between the Dimension tables and the Fact Table. Two are the most important characteristics of this N-way join that distinguishes star queries:
- All dimension tables can be joined only through the Fact Table and not directly (unless you want to run a Cartesian Join between the dimensions)
- The Fact Table is typically a very large table compared to the dimension tables, or better said: the Fact Table is orders of magnitude larger that dimension tables.
So, if we have a fact table FT and the following dimension tables: D1, D2, D2 and D4, then any star query must first join the FT with one dimension table and then the result of this join, must be joined to the next dimension table and so on. As you understand from the very first join we create a very large intermediate result, since we have to join to the FT from the beginning. Then we have to keep "carrying" this large result through all subsequent joins.
Moreover, with the 2nd point above we want to stress the fact that typically a fact table is significantly larger than a dimension table. So if a dimension table has 10K rows, then a fact table will have 1,000K rows or 1,000,000K rows etc.
Now, lets return to our join-trees discussion and take a look at the following right-deep join tree in Figure 3.
Figure 3: A Right-Deep Join Tree for a star-query
As we have seen in order to process this join-tree we will need to allocate four separate workareas in the PGA, concurrently, for storing the rows of the dimension tables D1, D2, D3 and D4. Compare this to the following figure depicting a left-deep join tree for the same star query.
Figure 4: A Left-Deep Join Tree for a star-query
In this case, we have to allocate a very large workarea in the PGA in order to store the hashed rows of the Fact Table, which plays the role of the "build table" in the 1st join with D1. But even if we swap the tables in the 1st join and make D1 the "build table", we would still need a very large workarea for the results of the 1st join, which is the "build table" for the 2nd join (depicted as TempA in Figure 4). So even if we only need two workareas available concurrently during the processing of this left-deep join-tree, we need much more memory, because we have to build a hash table from the rows of the fact table. This is not required for the right-deep join tree and thus it is more memory efficient. Lets see this live with an example.
A Working Example
Lets take an example star query:
Select /*+ gather_plan_statistics */
cust_city, prod_name, count(*), sum(amount_sold)
From sales
fct
Join
customers using(cust_id)
Join
products using(prod_id)
Join
channels using(channel_id)
Join
times using(time_id)
Group by cust_city, prod_name
We run the query as-is
and it takes 10 secs (elapsed time from SQL*Plus). When we don’t use hints, we
see that the CBO selects a right-deep join tree:
nikos@NIKOSDB> @xplan_stats
Enter value for sql_id: 4dv0aawkp7tka
old 42: WHERE sql_id = '&sql_id'
new 42: WHERE sql_id = '4dv0aawkp7tka'
Object
ID OPERATION Name ACTUAL_TIME LAST_STARTS A_ROWS E_ROWS_X_STARTS COST LIO PREAD PWRITES LAST_DEGREE LAST_MEMORY_USED_MBS LAST_TEMPSEG_SIZE_MBS LAST_EXECUTION
--- -------------------------------------------------- ------------------------------ ----------- ------------ ------------ --------------- ------------ ------------ ------------ ------------ ----------- -------------------- --------------------- --------------
0 SELECT STATEMENT_ 1.448 1 35,529 9,137 3,169 0 0
1 HASH_GROUP BY 1.448 1 35,529 31,127 9,137 3,169 0 0 1 5,404 OPTIMAL
2 HASH JOIN_ 1.711 1 918,843 918,843 2,610 3,169 0 0 1 1,261 OPTIMAL
3 PART JOIN FILTER_CREATE :BF0000 .001 1 1,826 1,826 3 11 0 0
4 INDEX_FAST FULL SCAN TIMES_PK .000 1 1,826 1,826 3 11 0 0
5 HASH JOIN_ 1.357 1 918,843 918,843 2,602 3,158 0 0 1 1,204 OPTIMAL
6 TABLE ACCESS_FULL PRODUCTS .000 1 72 72 3 4 0 0
7 HASH JOIN_ 1.011 1 918,843 918,843 2,594 3,154 0 0 1 743 OPTIMAL
8 INDEX_FULL SCAN CHANNELS_PK .000 1 5 5 1 1 0 0
9 HASH JOIN_ .696 1 918,843 918,843 2,588 3,153 0 0 1 3,216 OPTIMAL
10 TABLE ACCESS_FULL CUSTOMERS .007 1 55,500 55,500 405 1,459 0 0
11 PARTITION RANGE_JOIN-FILTER .312 1 918,843 918,843 494 1,694 0 0
12 TABLE ACCESS_FULL SALES .200 20 918,843 18,376,860 494 1,694 0 0
13 rows selected.
Elapsed: 00:00:00.01
Listing 2:With no hints, the CBO chooses a right-depth join tree for the star query
The execution plan and the run-time statistics are taken from V$SQL_PLAN_STATISTICS_ALL. If you scroll far to the right in Listing 2, we can see that the
workarea has been used in “optimal” mode (column "LAST_EXECUTION"), which means that no writing to
disk took place during the hash joins execution. The memory allocated to the workarea (a
total of 5,404 MBs - see column "LAST_MEMORY_USED_MBS") was sufficient so as the hash join and the grouping
operations to be executed solely in memory.
Now, lets force a
left-deep join tree and see the differences in the utilization of the workarea:
Select /*+ gather_plan_statistics
leading(fct,
customers, products, channels, times)
use_hash(customers)
use_hash(products)
use_hash(channels)
use_hash(times)
no_swap_join_inputs(customers)
no_swap_join_inputs(products)
no_swap_join_inputs(channels)
no_swap_join_inputs(times)
*/ cust_city, prod_name, count(*),
sum(amount_sold)
From sales fct
Join
customers using(cust_id)
Join
products using(prod_id)
Join
channels using(channel_id)
Join times
using(time_id)
Group by cust_city, prod_name
This time the elapsed time this time
is 30 secs (compared to 10 secs before). Lets see the execution plan and runtime statistics.
Enter value for sql_id: 0pvkrnhmbmm15
old 42: WHERE sql_id = '&sql_id'
new 42: WHERE sql_id = '0pvkrnhmbmm15'
Object
ID OPERATION Name ACTUAL_TIME LAST_STARTS A_ROWS E_ROWS_X_STARTS COST LIO PREAD PWRITES LAST_DEGREE LAST_MEMORY_USED_MBS LAST_TEMPSEG_SIZE_MBS LAST_EXECUTION
--- -------------------------------------------------- ------------------------------ ----------- ------------ ------------ --------------- ------------ ------------ ------------ ------------ ----------- -------------------- --------------------- --------------------
0 SELECT STATEMENT_ 22.130 1 35,529 18,679 3,428 18,149 18,070
1 HASH_GROUP BY 22.130 1 35,529 31,127 18,679 3,428 18,149 18,070 1 5,396 OPTIMAL
2 HASH JOIN_ 20.447 1 918,843 918,843 12,152 3,428 18,149 18,070 1 59,989 0 1 PASS
3 HASH JOIN_ 17.201 1 918,843 918,843 8,442 3,367 11,669 11,605 1 83,154 0 1 PASS
4 HASH JOIN_ 12.431 1 918,843 918,843 4,865 3,366 8,744 8,680 1 30,032 0 1 PASS
5 HASH JOIN_ 3.333 1 918,843 918,843 2,590 3,192 4,031 3,999 1 35,074 0 1 PASS
6 PARTITION RANGE_ALL .517 1 918,843 918,843 494 1,718 1 0
7 TABLE ACCESS_FULL SALES .341 28 918,843 25,727,604 494 1,718 1 0
8 TABLE ACCESS_FULL CUSTOMERS .014 1 55,500 55,500 405 1,459 0 0
9 TABLE ACCESS_FULL PRODUCTS .000 1 72 72 3 4 0 0
10 INDEX_FULL SCAN CHANNELS_PK .000 1 5 5 1 1 0 0
11 INDEX_FAST FULL SCAN TIMES_PK .001 1 1,826 1,826 3 11 0 0
12 rows selected.
Elapsed: 00:00:00.01
Listing 3:With hints, we force the CBO to choose a left-depth join tree for the star query
It is clear that the
left-deep join tree is less efficient. The hash-joins have been executed in
“one-pass” mode (column "LAST_EXECUTION" - you need to scroll to the right to see it), which means that when Oracle tried to create the “Build Table”
for the hash join it had to write data into disk once and read them back once. This is due to the fact the build table is created for the largest table in the join,
which is the fact table. We can also see from the statistics that this time,
much more memory has been allocated for the workarea (column "LAST_MEMORY_USED_MBS"). And as we have said, the
elapsed time of the query has been tripled.
How-to force a Left-Deep or a Right-Deep Join Tree with hints (hash join input swapping)
So with hash joins (only) the CBO might choose a right-depth tree, or a left-depth tree (available for all join methods). If we want to force a left-deep join tree, or a right-deep join tree, how can we do it? In this case, we have to use the NO_SWAP_JOIN_INPUTS, or SWAP_JOIN_INPUTS hints in conjunction with the LEADING hint, in order to force a specific join order, and the USE_HASH hint in order to force a hash-join method. For example,
If we want to force a left-deep join tree we can use the no_swap_join_inputs hint as follows:
select /*+ leading(t1 t2 t3 t4 t5)
no_swap_join_inputs(t2)
no_swap_join_inputs(t3)
no_swap_join_inputs(t4)
no_swap_join_inputs(t5)
*/ *
from t1
join t2 using(id)
join t3 using(id)
join t4 using(id)
join t5 using(id)
This will generate a left-deep join tree regardless of the join-method used. If you want to force a hash-join method then you also have to use the USE_HASH hint.
select /*+ leading(t1 t2 t3 t4 t5)
use_hash(t2)
use_hash(t3)
use_hash(t4)
use_hash(t5)
swap_join_inputs(t2)
swap_join_inputs(t3)
swap_join_inputs(t4)
swap_join_inputs(t5)
*/ *
from t1
join t2 using(id)
join t3 using(id)
join t4 using(id)
join t5 using(id)
In this case, we also have to use the USE_HASH hint in order to ensure that a hash-join will take place. Otherwise, no right-deep join tree can be chosen when a different join method is used.
No comments:
Post a Comment