Friday, March 13, 2015

Join hints and join ordering, or, why is the optimizer ignoring my join hint?

Join hints and join ordering, or, why is the optimizer ignoring my join hint?
Have you ever tried to use a join hint (e.g., USE_NL, or USE_HASH) in a query, only to realize that the query optimizer stubbornly refuses to obey?

No matter how frustrating this might be for you, there is an explanation for this behavior. One of the main reasons for such an event is the following rule:

Oracle first decides on the join order and then on the join method to use.

In this post, we will describe how join ordering can be the reason for the optimizer to ignore a hint for forcing a specific join method (e.g., nested loops, hash join, sort merge join etc.) Moreover, we will show how one can use the USE_NL or USE_HASH hints (or any hint that forces a specific join method) in combination with the LEADING (or ORDERED) hint (i.e., hints that force a specific join order).

A similar disussion on this topic can be found here (René Nyffenegger's blog)
and here (Jonathan Lewis' blog).

An Example as an Appetizer

Lets start with a simple join query. We have a large table TLARGE and a small table TSMALL. We will join these two tables on the id column. TLARGE has an index on the id column. Our Oracle database version is 11.2.0.1.0. If we dont use any hints at all, then the optimizer chooses the small table as the driving table, and nested loops as the join method. 


nikos@NIKOSDB> create table TSMALL as
  2  select level id, rpad('x',DBMS_RANDOM.random,'x') descr
  3  from dual
  4  connect by level < 100
  5  /

Table created.

Elapsed: 00:00:00.42
nikos@NIKOSDB> exec dbms_stats.gather_table_stats('NIKOS', 'TSMALL')

nikos@NIKOSDB> create table TLARGE as select level id, rpad('x',DBMS_RANDOM.random,'x') descr
  2  from dual
  3  connect by level < 10000
  4  /

Table created.

Elapsed: 00:00:03.92
nikos@NIKOSDB> exec dbms_stats.gather_table_stats('NIKOS', 'TLARGE')

PL/SQL procedure successfully completed.

nikos@NIKOSDB> create index tlarge_idx on tlarge(id)
  2  compute statistics;

Index created.

nikos@NIKOSDB> select * from TSMALL, TLARGE WHERE tsmall.id = tlarge.id
  2  /

Plan hash value: 1644700146

-------------------------------------------------------------------------
| Id  | Operation                    | Name       | E-Rows | Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |        |   212 (100)|
|   1 |  NESTED LOOPS                |            |        |            |
|   2 |   NESTED LOOPS               |            |     99 |   212   (0)|
|   3 |    TABLE ACCESS FULL         | TSMALL     |     99 |    14   (0)|
|*  4 |    INDEX RANGE SCAN          | TLARGE_IDX |      1 |     1   (0)|
|   5 |   TABLE ACCESS BY INDEX ROWID| TLARGE     |      1 |     2   (0)|
-------------------------------------------------------------------------

 
Listing 1: For our example query the optimizer has chosen TSMALL, TLARGE as a join order and nested loops as a join method.

In other words, the optimizer has chosen TSMALL, TLARGE as a join order and NESTED LOOPS as the join method. Note that in the execution plan you see twο nested loops instead of one. This does not mean that it's time to visit your ophthalmologist, but it means that a new optimization feature in nested loops join of 11g has kicked in. For more info on this feature you can read this, since it is irrelevant with the scope of this post.

So far so good. Now, lets try to use a hint to force a hash join instead of the nested loops chosen by the optimizer. How about USE_HASH(TSMALL)? Or maybe USE_HASH(TLARGE)? Or even better USE_HASH(TSMALL TLARGE)? Lets see what happens for each one in Listing 2.


nikos@NIKOSDB>  select /*+ USE_HASH(TSMALL) */ * from TSMALL, TLARGE WHERE tsmall.id = tlarge.id
  2  /
Plan hash value: 1644700146

-------------------------------------------------------------------------
| Id  | Operation                    | Name       | E-Rows | Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |        |   212 (100)|
|   1 |  NESTED LOOPS                |            |        |            |
|   2 |   NESTED LOOPS               |            |     99 |   212   (0)|
|   3 |    TABLE ACCESS FULL         | TSMALL     |     99 |    14   (0)|
|*  4 |    INDEX RANGE SCAN          | TLARGE_IDX |      1 |     1   (0)|
|   5 |   TABLE ACCESS BY INDEX ROWID| TLARGE     |      1 |     2   (0)|
-------------------------------------------------------------------------

nikos@NIKOSDB> select /*+ USE_HASH(TLARGE) */ * from TSMALL, TLARGE WHERE tsmall.id = tlarge.id
  2  /
Plan hash value: 2985482125

--------------------------------------------------------------------------------------
| Id  | Operation          | Name   | E-Rows | Cost (%CPU)|  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |        |  1396 (100)|       |       |          |
|*  1 |  HASH JOIN         |        |     99 |  1396   (1)|   732K|   732K| 1185K (0)|
|   2 |   TABLE ACCESS FULL| TSMALL |     99 |    14   (0)|       |       |          |
|   3 |   TABLE ACCESS FULL| TLARGE |   9999 |  1382   (1)|       |       |          |
--------------------------------------------------------------------------------------

nikos@NIKOSDB>  select /*+ USE_HASH(TSMALL TLARGE) */ * from TSMALL, TLARGE WHERE tsmall.id = tlarge.id
  2  /

Plan hash value: 2985482125

--------------------------------------------------------------------------------------
| Id  | Operation          | Name   | E-Rows | Cost (%CPU)|  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |        |  1396 (100)|       |       |          |
|*  1 |  HASH JOIN         |        |     99 |  1396   (1)|   732K|   732K| 1188K (0)|
|   2 |   TABLE ACCESS FULL| TSMALL |     99 |    14   (0)|       |       |          |
|   3 |   TABLE ACCESS FULL| TLARGE |   9999 |  1382   (1)|       |       |          |
--------------------------------------------------------------------------------------

nikos@NIKOSDB> select /*+ USE_HASH(TLARGE TSMALL) */ * from TSMALL, TLARGE WHERE tsmall.id = tlarge.id
  2  /

Plan hash value: 2985482125

--------------------------------------------------------------------------------------
| Id  | Operation          | Name   | E-Rows | Cost (%CPU)|  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |        |  1396 (100)|       |       |          |
|*  1 |  HASH JOIN         |        |     99 |  1396   (1)|   732K|   732K| 1187K (0)|
|   2 |   TABLE ACCESS FULL| TSMALL |     99 |    14   (0)|       |       |          |
|   3 |   TABLE ACCESS FULL| TLARGE |   9999 |  1382   (1)|       |       |          |
--------------------------------------------------------------------------------------

 
Listing 2: Try to force hash join with a hint. It seems that the optimizer does not always obey!

3 out of 4, not bad! It looks like the optimizer has obeyed our hint and used a hash join method instead of a nested loops in the three out of four attempts. In particular, the optimizer ignored hint USE_HASH(TSMALL). Why is this happening? Is this hint wrong?

Well, as we said in the beginning the answer lies in the join order.

The Join Method hint must conform to the Join Order

The join order is the sequence of tables that Oracle chooses for joining a series of tables. The first table in the order is the driving table and each table in the row, is joined with the previous intermediate join result. So for example, a join order of T1, T2, T3, T4 means that T1 is the driving table and is joined with T2. Then, the result of this join is joined with T3. Finally, the result of this join is joined with T4.

The join method is the join implementation (or algorithm if you like computer science terms) that Oracle chooses for executing a join. Examples of join methods are the Nested Loops Join, the Hash Join and the Sort-Merge Join.

It is very important to understand that: The optimizer first decides on the join order and then on the join method. The practical implication of this rule is that whenever we use a join hint to force a specific join method, our hint must be "in conformance" with the join order already selected by the optimizer, otherwise it will be ignored by the optimizer. Alternatively, we can use another hint to force a different join order (i.e., LEADING or ORDERED hint), that will be conformed with our join hint.

So for example in our case, the optimizer has chosen TSMALL, TLARGE as the join order. Now, what do you think that the hint USE_HASH(TSMALL) means? Does it mean: "Hey optimizer, use hash join to join TSMALL table!" ?. Nope.  

The hint USE_HASH(TSMALL) means: "if TSMALL is the second table in the join order (or the "next table" in the join order in general) then use a hash join to access it". If you don't believe me, then you can read the manual here. I know that it is not clear but lets look carefully what the manual says: 
"The USE_HASH hint instructs the optimizer to join each specified table with another row source using a hash join."
So, if we have to join the specified table in the hint with "another row source" then this implies (its true, not with the most unambiguous way), that our table comes second (or "next" in the general case) in the join sequence!

Therefore, when we instruct the optimizer with the hint USE_HASH(TSMALL) but the optimizer has already decided that TSMALL is the first table in the join sequence, then our hint is not in conformance with the join ordering decision and thus it is ignored. It's so simple!

What about USE_HASH(TSMALL TLARGE)? Well you might think that this means: "Hey optimizer, join TSMALL and TLARGE with a hash join!" but this is not true. This hint merely means:
"If TSMALL is the next table in the join order, then use hash join to access it, and if TLARGE is the next table in the join order, then use hash join to access it"
So the following is true:
USE_HASH(TSMALL TLARGE) = USE_HASH(TSMALL) USE_HASH(TLARGE)

This explains, why the optimizer obeyed our hints USE_HASH(TSMALL TLARGE) and USE_HASH(TLARGE TSMALL). It is because for both of this hints, the "USE_HASH(TLARGE) part" is in conformance with the decided join order, i.e., TSMALL, TLARGE.

Influencing the Join Order as well as the Join Method

Now, what can we do if we want to use a join method hint that is not in conformance with the chosen join order, as in the case of USE_HASH(TSMALL) in our running example?

Well, in this case we have to influence the join order so as to be in conformance with our join method hint. In Oracle 11g we do this with the LEADING hint. In prior versions we used the ORDERED hint. So, in our case for making the optimizer obey our USE_HASH(TSMALL) hint, we have to also use the following hint LEADING(TLARGE TSMALL) so as to change the join order from TSMALL, TLARGE to TLARGE, TSMALL. Lets see it live in Listing 3 next.

nikos@NIKOSDB> select /*+ LEADING(TLARGE TSMALL) USE_HASH(TSMALL) */ * from TSMALL, TLARGE WHERE tsmall.id = tlarge.id
  2  /
Plan hash value: 4100290483

--------------------------------------------------------------------------------------
| Id  | Operation          | Name   | E-Rows | Cost (%CPU)|  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |        |  2360 (100)|       |       |          |
|*  1 |  HASH JOIN         |        |     99 |  2360   (1)|    21M|  3603K|   21M (0)|
|   2 |   TABLE ACCESS FULL| TLARGE |   9999 |  1382   (1)|       |       |          |
|   3 |   TABLE ACCESS FULL| TSMALL |     99 |    14   (0)|       |       |          |
--------------------------------------------------------------------------------------
 
Listing 3: This time with the LEADING hint, the optimizer finally obeyed and used a hash join!
Bingo! When we changed the join order with the use of the leading hint so as to be in conformance with our join hint, the optimizer finally obeyed and used a hash join.

USE_HASH and USE_NL with the LEADING hint

Now that we have clarified all that and now it is all crystal clear to you, it is very easy to understand that only the following combinations are meaningful:
/*+ LEADING(X Y) USE_HASH(Y) */  and /*+ LEADING(X Y) USE_NL(Y) */
and is now simple to understand that the following hints will be simply ignored by the optimizer because they are meaningless
/*+ LEADING(X Y) USE_HASH(X) */  and /*+ LEADING(X Y) USE_NL(X) */

Summary

In this post, we have tried to explain why many times the optimizer seems to ignore our join hints based on the "incompatibility" of the join order and the join hint. Behind all these, lies the fact the optimizer first decides on the join order and then on the join  method and thus the latter must be in conformance with the former. To overcome this, you can use the LEADING hint in order to change the join order so as to be in conformance with the join hint.

No comments:

Post a Comment