Have you ever been called to tune a query that runs forever?
Even worse, what if the query is very complex, you don't know anything about the "logic" behind the SQL - since it was written by someone else - and when you see the execution plan it is several pages long?
Moreover, what if the query is for retrieving a list of customers for a business critical campaign that has to run asap, and therefore the beloved management is over your shoulder, waiting for you to solve the problem!
For example such a great tuning method would be the following:
So what can we do? We need a way to get accurate execution statistics at the individual operation level, so as to easily identify the bottlenecks, while the query is still running. What are our options?
Another option is to use one of the greatest-ever Oracle features (available since 10g), the Active Session History (ASH) (licensed with Oracle Diagnostic Pack). Note that from version 11g, ASH has become even greater since it includes columns indicating the individual execution plan operation and thus making the identification of time-consuming steps very easy.
In this post, we will give an example of a complex query, that runs forever and describe how we can tune it (i.e., to identify the bottleneck and then try to fix it) with the use of both of these tools (SQL monitoring and ASH).
(Note: A nice post from N. Savvinov on a similar topic can be found here.)
looks like this:Even worse, what if the query is very complex, you don't know anything about the "logic" behind the SQL - since it was written by someone else - and when you see the execution plan it is several pages long?
Moreover, what if the query is for retrieving a list of customers for a business critical campaign that has to run asap, and therefore the beloved management is over your shoulder, waiting for you to solve the problem!
Problem Definition
Never ending queries, or at least queries that cannot finish within an acceptable time frame are quite common in large Data Warehouses. Unfortunately, in this situation one cannot use methods where execution statistics are gathered first and then analysis of the execution steps can take place. Simply because the query does not finish!For example such a great tuning method would be the following:
- execute the query with the hint /*+ gather_plan_statistics */ (or alternatively issuing an ALTER SESSION SET STATISTICS_LEVEL=ALL prior to query execution), in order to enable the collection of the so-called "row-source statistics" (i.e., execution statistics for each operation (row-source) in the execution plan).
- After the completion of the query, use DBMS_XPLAN.DISPLAY_CURSOR, or directly query V$SQL_PLAN_STATISTICS_ALL, in order to:
- Identify problematic cardinality estimations by the optimizer by comparing actual rows retrieved in each step of the plan versus estimated rows.
- Identify bottlenecks in the plan, i.e., operations that take too long to complete.
So what can we do? We need a way to get accurate execution statistics at the individual operation level, so as to easily identify the bottlenecks, while the query is still running. What are our options?
11g has the Solution!
Well, from version 11g and onwards there is a great feature called Real-Time SQL Monitoring that allows you to monitor a query executing in real-time and provides useful statistics at an individual plan operation level that help you easily (and quickly!) identify the bottleneck in an execution plan. (Note that the Real-Time SQL Monitoring feature is licensed with Oracle Tuning Pack in order to use it in production). If you want a more detailed description of this feature check out this Oracle white paper and of course the documentation (here).Another option is to use one of the greatest-ever Oracle features (available since 10g), the Active Session History (ASH) (licensed with Oracle Diagnostic Pack). Note that from version 11g, ASH has become even greater since it includes columns indicating the individual execution plan operation and thus making the identification of time-consuming steps very easy.
In this post, we will give an example of a complex query, that runs forever and describe how we can tune it (i.e., to identify the bottleneck and then try to fix it) with the use of both of these tools (SQL monitoring and ASH).
(Note: A nice post from N. Savvinov on a similar topic can be found here.)
A real-life Example
So we have a complex query that has been running for some hours and the execution plan, which is retrieved with a call like this:
nkarag@DWHPRD> select * from table( dbms_xplan.display_cursor('&sql_id', '&child_number', 'ALLSTATS LAST'))
2 /
Listing 1: How we retrieve the plan of a running query based on sql_id.
---------------------------------------------------------------------------------------------------------------------Plan hash value: 2144277924
---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | OMem | 1Mem | Used-Mem | Used-Tmp|
---------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | | |
| 1 | TEMP TABLE TRANSFORMATION | | | | | | |
| 2 | LOAD AS SELECT | | | 256K| 256K| | |
|* 3 | HASH JOIN RIGHT SEMI | | 642K| 25M| 3344K| | |
| 4 | VIEW | VW_NSO_1 | 642K| | | | |
|* 5 | HASH JOIN | | 642K| 47M| 5162K| | |
|* 6 | HASH JOIN | | 642K| 586K| 586K| | |
|* 7 | TABLE ACCESS STORAGE FULL | PRODUCT_DIM | 900 | | | | |
|* 8 | TABLE ACCESS STORAGE FULL | M_W_INSTALL_ASSETS | 10M| | | | |
|* 9 | INDEX STORAGE FAST FULL SCAN | UCM_CUSTOMER_DIM_UK2 | 7462K| | | | |
|* 10 | INDEX STORAGE FAST FULL SCAN | UCM_CUSTOMER_DIM_UK2 | 7462K| | | | |
| 11 | HASH UNIQUE | | 1 | 436K| 436K| | |
| 12 | CONCATENATION | | | | | | |
|* 13 | FILTER | | | | | | |
| 14 | PX COORDINATOR | | | | | | |
| 15 | PX SEND QC (RANDOM) | :TQ10006 | 872K| | | | |
|* 16 | HASH JOIN RIGHT OUTER BUFFERED | | 872K| 686K| 686K| | |
| 17 | BUFFER SORT | | | 4096 | 4096 | | |
| 18 | PX RECEIVE | | 59 | | | | |
| 19 | PX SEND BROADCAST | :TQ10002 | 59 | | | | |
| 20 | TABLE ACCESS STORAGE FULL | SEGMENT_OTE_BASIC_DIM | 59 | | | | |
|* 21 | HASH JOIN | | 872K| 60M| 5023K| 9M (0)| |
| 22 | PX RECEIVE | | 981K| | | | |
| 23 | PX SEND HASH | :TQ10004 | 981K| | | | |
|* 24 | HASH JOIN BUFFERED | | 981K| 84M| 7595K| 21M (0)| 6144 |
| 25 | BUFFER SORT | | | 14M| 1032K| | |
| 26 | PX RECEIVE | | 642K| | | | |
| 27 | PX SEND BROADCAST | :TQ10000 | 642K| | | | |
| 28 | VIEW | | 642K| | | | |
| 29 | TABLE ACCESS STORAGE FULL | SYS_TEMP_0FD9FD04F_8289D6BF | 642K| | | | |
|* 30 | HASH JOIN | | 11M| 278M| 13M| 52M (0)| |
| 31 | BUFFER SORT | | | 46M| 1767K| | |
| 32 | PX RECEIVE | | 7462K| | | | |
| 33 | PX SEND HASH | :TQ10001 | 7462K| | | | |
|* 34 | TABLE ACCESS STORAGE FULL | UCM_CUSTOMER_DIM | 7462K| | | | |
| 35 | PX RECEIVE | | 11M| | | | |
| 36 | PX SEND HASH | :TQ10003 | 11M| | | | |
| 37 | PX BLOCK ITERATOR | | 11M| | | | |
|* 38 | TABLE ACCESS STORAGE FULL | BILLING_ACCOUNT_DIM | 11M| | | | |
| 39 | PX RECEIVE | | 10M| | | | |
| 40 | PX SEND HASH | :TQ10005 | 10M| | | | |
| 41 | PX BLOCK ITERATOR | | 10M| | | | |
|* 42 | TABLE ACCESS STORAGE FULL | CLI_DIM | 10M| | | | |
|* 43 | VIEW | | 642K| | | | |
| 44 | TABLE ACCESS STORAGE FULL | SYS_TEMP_0FD9FD04F_8289D6BF | 642K| | | | |
|* 45 | FILTER | | | | | | |
| 46 | PX COORDINATOR | | | | | | |
| 47 | PX SEND QC (RANDOM) | :TQ20006 | 29649 | | | | |
|* 48 | HASH JOIN | | 29649 | 4663K| 1167K| | |
| 49 | PX RECEIVE | | 32198 | | | | |
| 50 | PX SEND BROADCAST | :TQ20005 | 32198 | | | | |
|* 51 | HASH JOIN BUFFERED | | 32198 | 25M| 3498K| | |
| 52 | BUFFER SORT | | | 73728 | 73728 | | |
| 53 | PX RECEIVE | | 642K| | | | |
| 54 | PX SEND HASH | :TQ20002 | 642K| | | | |
| 55 | VIEW | | 642K| | | | |
| 56 | TABLE ACCESS STORAGE FULL | SYS_TEMP_0FD9FD04F_8289D6BF | 642K| | | | |
| 57 | PX RECEIVE | | 374K| | | | |
| 58 | PX SEND HASH | :TQ20004 | 374K| | | | |
|* 59 | FILTER | | | | | | |
|* 60 | HASH JOIN RIGHT OUTER BUFFERED| | 374K| 686K| 686K| | |
| 61 | BUFFER SORT | | | 73728 | 73728 | | |
| 62 | PX RECEIVE | | 59 | | | | |
| 63 | PX SEND BROADCAST | :TQ20000 | 59 | | | | |
| 64 | TABLE ACCESS STORAGE FULL | SEGMENT_OTE_BASIC_DIM | 59 | | | | |
|* 65 | HASH JOIN | | 11M| 329M| 11M| | |
| 66 | BUFFER SORT | | | 73728 | 73728 | | |
| 67 | PX RECEIVE | | 7462K| | | | |
| 68 | PX SEND HASH | :TQ20001 | 7462K| | | | |
|* 69 | TABLE ACCESS STORAGE FULL| UCM_CUSTOMER_DIM | 7462K| | | | |
| 70 | PX RECEIVE | | 11M| | | | |
| 71 | PX SEND HASH | :TQ20003 | 11M| | | | |
| 72 | PX BLOCK ITERATOR | | 11M| | | | |
|* 73 | TABLE ACCESS STORAGE FULL| BILLING_ACCOUNT_DIM | 11M| | | | |
| 74 | PX BLOCK ITERATOR | | 10M| | | | |
|* 75 | TABLE ACCESS STORAGE FULL | CLI_DIM | 10M| | | | |
|* 76 | VIEW | | 642K| | | | |
| 77 | TABLE ACCESS STORAGE FULL | SYS_TEMP_0FD9FD04F_8289D6BF | 642K| | | | |
---------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("SBL_CUSTOMER_ID"="C1")
5 - access("SBL_CUSTOMER_ID"="SIEBEL_CUSTOMER_ID")
6 - access("A"."SOC_PRODUCT_SK"="PRODUCT_SK")
7 - storage(("A"."ROOT_PRODUCT_DESC"='IP TV' OR "A"."ROOT_PRODUCT_DESC"='TV SAT' OR "A"."ROOT_PRODUCT_DESC"='TV SAT
Hotel'))
filter(("A"."ROOT_PRODUCT_DESC"='IP TV' OR "A"."ROOT_PRODUCT_DESC"='TV SAT' OR "A"."ROOT_PRODUCT_DESC"='TV SAT
Hotel'))
8 - storage("STATUS"='Active')
filter("STATUS"='Active')
9 - storage("SBL_CUSTOMER_ID" IS NOT NULL)
filter("SBL_CUSTOMER_ID" IS NOT NULL)
10 - storage("SBL_CUSTOMER_ID" IS NOT NULL)
filter("SBL_CUSTOMER_ID" IS NOT NULL)
13 - filter( IS NOT NULL)
16 - access("A"."OTE_BASIC_SEGMENT_SK"="B"."OTE_BASIC_SEGMENT_SK")
21 - access("GNV_BILLING_ACCOUNT_NUM"="GNV_ACCOUNT_NUM")
24 - access("SBL_CUSTOMER_ID"="D1"."C1")
30 - access("SBL_CUSTOMER_ID"="SBL_CUSTOMER_ID")
34 - storage("SBL_CUSTOMER_ID" IS NOT NULL)
filter("SBL_CUSTOMER_ID" IS NOT NULL)
38 - storage(:Z>=:Z AND :Z<=:Z)
42 - storage(:Z>=:Z AND :Z<=:Z)
43 - filter("D1"."C1"=:B1)
45 - filter(LNNVL( IS NOT NULL))
48 - access("GNV_BILLING_ACCOUNT_NUM"="GNV_ACCOUNT_NUM")
51 - access("SBL_CUSTOMER_ID"="D1"."C1")
59 - filter(("B"."OTE_BASIC_SEGMENT_GROUP_DESC"='┼╨┴├├┼╦╠┴╘╔┼╙' OR "B"."OTE_BASIC_SEGMENT_GROUP_DESC"='╧╔╩╔┴╩╧╔'))
60 - access("A"."OTE_BASIC_SEGMENT_SK"="B"."OTE_BASIC_SEGMENT_SK")
65 - access("SBL_CUSTOMER_ID"="SBL_CUSTOMER_ID")
69 - storage("SBL_CUSTOMER_ID" IS NOT NULL)
filter("SBL_CUSTOMER_ID" IS NOT NULL)
73 - storage(:Z>=:Z AND :Z<=:Z AND (("ACCOUNT_TYPE_DESC"='┼╨┴├├┼╦╠┴╘╔┼╙' OR "ACCOUNT_TYPE_DESC"='╧╔╩╔┴╩╧╔')))
filter(("ACCOUNT_TYPE_DESC"='┼╨┴├├┼╦╠┴╘╔┼╙' OR "ACCOUNT_TYPE_DESC"='╧╔╩╔┴╩╧╔'))
75 - storage(:Z>=:Z AND :Z<=:Z)
76 - filter("D1"."C1"=:B1)
Listing 2: The execution plan of the problematic query.
---------------------------------------------------------------------------------------------------------------------nkarag@DWHPRD> host cat mons_text.sql
set longchunksize 1000000
set linesize 999
set long 1000000
set heading off verify off autotrace off feedback off
set timing off
set wrap on
set pagesize 1000
accept sql_id prompt "Enter value for sql_id: "
set termout off
spool sm.txt replace
SELECT DBMS_SQLTUNE.report_sql_monitor(
sql_id => '&&sql_id',
type => 'TEXT',
report_level => 'ALL') AS report
FROM dual;
spool off
host cat sm.txt
undef sql_id
Listing 3: A script for retrieving SQL Monitor results for a specific sql_id
---------------------------------------------------------------------------------------------------------------------SQL Plan Monitoring Details (Plan Hash Value=2144277924)
==========================================================================================================================================================================================================================================================
| Id | Operation | Name | Rows | Cost | Time | Start | Execs | Rows | Read | Read | Write | Write | Cell | Mem | Temp | Activity | Activity Detail |
| | | | (Estim) | | Active(s) | Active | | (Actual) | Reqs | Bytes | Reqs | Bytes | Offload | | | (%) | (# samples) |
==========================================================================================================================================================================================================================================================
| 0 | SELECT STATEMENT | | | | | | 1 | | | | | | | | | | |
| 1 | TEMP TABLE TRANSFORMATION | | | | | | 1 | | | | | | | | | | |
| 2 | LOAD AS SELECT | | | | 1 | +13 | 1 | 1 | | | 22 | 5MB | | | | | |
| 3 | HASH JOIN RIGHT SEMI | | 642K | 414K | 6 | +8 | 1 | 326K | | | | | | | | 0.05 | Cpu (2) |
| 4 | VIEW | VW_NSO_1 | 642K | 404K | 6 | +8 | 1 | 742K | | | | | | | | | |
| 5 | HASH JOIN | | 642K | 404K | 10 | +4 | 1 | 742K | | | | | | | | 0.05 | Cpu (2) |
| 6 | HASH JOIN | | 642K | 393K | 7 | +2 | 1 | 742K | | | | | | | | 0.07 | Cpu (3) |
| 7 | TABLE ACCESS STORAGE FULL | PRODUCT_DIM | 900 | 1355 | 4 | +1 | 1 | 926 | 376 | 36MB | | | | | | 0.05 | cell multiblock physical read (2) |
| 8 | TABLE ACCESS STORAGE FULL | M_W_INSTALL_ASSETS | 10M | 392K | 5 | +4 | 1 | 31M | 24205 | 23GB | | | 96.31% | | | 0.02 | Cpu (1) |
| 9 | INDEX STORAGE FAST FULL SCAN | UCM_CUSTOMER_DIM_UK2 | 7M | 4659 | 6 | +8 | 1 | 7M | 341 | 315MB | | | | | | 0.07 | Cpu (1) |
| | | | | | | | | | | | | | | | | | cell multiblock physical read (2) |
| 10 | INDEX STORAGE FAST FULL SCAN | UCM_CUSTOMER_DIM_UK2 | 7M | 4659 | 1 | +13 | 1 | 7M | | | | | | | | | |
| -> 11 | HASH UNIQUE | | 1 | 1M | 4168 | +24 | 1 | 0 | | | 85 | 8MB | | 3M | 10M | 0.05 | Cpu (2) |
| -> 12 | CONCATENATION | | | | 4168 | +24 | 1 | 791K | | | | | | | | | |
| -> 13 | FILTER | | | | 4168 | +24 | 1 | 791K | | | | | | | | 0.07 | Cpu (3) |
| -> 14 | PX COORDINATOR | | | | 4168 | +24 | 9 | 791K | 10 | 320KB | | | | | | 0.02 | Cpu (1) |
| -> 15 | PX SEND QC (RANDOM) | :TQ10006 | 872K | 186K | 4167 | +25 | 8 | 796K | | | | | | | | | |
| -> 16 | HASH JOIN RIGHT OUTER BUFFERED | | 872K | 186K | 4174 | +18 | 8 | 796K | 296 | 65MB | 406 | 89MB | | 35M | 112M | | |
| 17 | BUFFER SORT | | | | 2 | +18 | 8 | 472 | | | | | | | | | |
| 18 | PX RECEIVE | | 59 | 4 | 2 | +18 | 8 | 472 | | | | | | | | | |
| 19 | PX SEND BROADCAST | :TQ10002 | 59 | 4 | 1 | +18 | 1 | 472 | | | | | | | | | |
| 20 | TABLE ACCESS STORAGE FULL | SEGMENT_OTE_BASIC_DIM | 59 | 4 | 1 | +18 | 1 | 59 | | | | | | | | | |
| 21 | HASH JOIN | | 872K | 186K | 17 | +18 | 8 | 1M | | | | | | | | 0.07 | Cpu (3) |
| 22 | PX RECEIVE | | 982K | 160K | 2 | +18 | 8 | 717K | | | | | | | | | |
| 23 | PX SEND HASH | :TQ10004 | 982K | 160K | | | | | | | | | | | | | |
| 24 | HASH JOIN BUFFERED | | 982K | 160K | 1 | +18 | | | | | | | | | | 0.07 | Cpu (3) |
| 25 | BUFFER SORT | | | | | | | | | | | | | | | | |
| 26 | PX RECEIVE | | 642K | 217 | | | | | | | | | | | | | |
| 27 | PX SEND BROADCAST | :TQ10000 | 642K | 217 | 1 | +14 | 1 | 3M | | | | | | | | | |
| 28 | VIEW | | 642K | 217 | 1 | +14 | 1 | 326K | | | | | | | | | |
| 29 | TABLE ACCESS STORAGE FULL | SYS_TEMP_0FD9FD04F_8289D6BF | 642K | 217 | 1 | +14 | 1 | 326K | 7 | 5MB | | | | | | | |
| 30 | HASH JOIN | | 11M | 160K | 2 | +17 | | | | | | | | | | 0.26 | Cpu (11) |
| 31 | BUFFER SORT | | | | 2 | +16 | | | | | | | | | | 0.07 | Cpu (3) |
| 32 | PX RECEIVE | | 7M | 141K | | | | | | | | | | | | | |
| 33 | PX SEND HASH | :TQ10001 | 7M | 141K | 5 | +14 | 1 | 7M | | | | | | | | | |
| 34 | TABLE ACCESS STORAGE FULL | UCM_CUSTOMER_DIM | 7M | 141K | 5 | +14 | 1 | 7M | 14649 | 8GB | | | 92.70% | | | 0.05 | Cpu (2) |
| 35 | PX RECEIVE | | 11M | 18775 | | | | | | | | | | | | | |
| 36 | PX SEND HASH | :TQ10003 | 11M | 18775 | 3 | +17 | 8 | 11M | | | | | | | | | |
| 37 | PX BLOCK ITERATOR | | 11M | 18775 | 3 | +17 | 8 | 11M | | | | | | | | | |
| 38 | TABLE ACCESS STORAGE FULL | BILLING_ACCOUNT_DIM | 11M | 18775 | 3 | +17 | 120 | 11M | 13150 | 7GB | | | 80.00% | | | 0.17 | Cpu (6) |
| | | | | | | | | | | | | | | | | | cell smart table scan (1) |
| 39 | PX RECEIVE | | 10M | 25360 | 17 | +18 | 8 | 10M | | | | | | | | | |
| 40 | PX SEND HASH | :TQ10005 | 10M | 25360 | 1 | +21 | | | | | | | | | | 0.05 | Cpu (2) |
| 41 | PX BLOCK ITERATOR | | 10M | 25360 | | | | | | | | | | | | | |
| 42 | TABLE ACCESS STORAGE FULL | CLI_DIM | 10M | 25360 | 6 | +19 | | | | | | | | | | 0.95 | Cpu (5) |
| | | | | | | | | | | | | | | | | | cell single block physical read (35) |
| -> 43 | VIEW | | 642K | 217 | 4168 | +24 | 627K | 627K | | | | | | | | 70.76 | Cpu (2981) |
| -> 44 | TABLE ACCESS STORAGE FULL | SYS_TEMP_0FD9FD04F_8289D6BF | 642K | 217 | 4168 | +24 | 627K | 4G | 1 | 32768 | | | | | | 26.73 | Cpu (1124) |
| | | | | | | | | | | | | | | | | | resmgr:cpu quantum (2) |
| 45 | FILTER | | | | | | | | | | | | | | | | |
| 46 | PX COORDINATOR | | | | | | | | | | | | | | | | |
| 47 | PX SEND QC (RANDOM) | :TQ20006 | 29649 | 186K | | | | | | | | | | | | | |
| 48 | HASH JOIN | | 29649 | 186K | | | | | | | | | | | | | |
| 49 | PX RECEIVE | | 32198 | 160K | | | | | | | | | | | | | |
| 50 | PX SEND BROADCAST | :TQ20005 | 32198 | 160K | | | | | | | | | | | | | |
| 51 | HASH JOIN BUFFERED | | 32198 | 160K | | | | | | | | | | | | | |
| 52 | BUFFER SORT | | | | | | | | | | | | | | | | |
| 53 | PX RECEIVE | | 642K | 217 | | | | | | | | | | | | | |
| 54 | PX SEND HASH | :TQ20002 | 642K | 217 | | | | | | | | | | | | | |
| 55 | VIEW | | 642K | 217 | | | | | | | | | | | | | |
| 56 | TABLE ACCESS STORAGE FULL | SYS_TEMP_0FD9FD04F_8289D6BF | 642K | 217 | | | | | | | | | | | | | |
| 57 | PX RECEIVE | | 374K | 160K | | | | | | | | | | | | | |
| 58 | PX SEND HASH | :TQ20004 | 374K | 160K | | | | | | | | | | | | | |
| 59 | FILTER | | | | | | | | | | | | | | | | |
| 60 | HASH JOIN RIGHT OUTER BUFFERED | | 374K | 160K | | | | | | | | | | | | | |
| 61 | BUFFER SORT | | | | | | | | | | | | | | | | |
| 62 | PX RECEIVE | | 59 | 4 | | | | | | | | | | | | | |
| 63 | PX SEND BROADCAST | :TQ20000 | 59 | 4 | | | | | | | | | | | | | |
| 64 | TABLE ACCESS STORAGE FULL | SEGMENT_OTE_BASIC_DIM | 59 | 4 | | | | | | | | | | | | | |
| 65 | HASH JOIN | | 11M | 160K | | | | | | | | | | | | | |
| 66 | BUFFER SORT | | | | | | | | | | | | | | | | |
| 67 | PX RECEIVE | | 7M | 141K | | | | | | | | | | | | | |
| 68 | PX SEND HASH | :TQ20001 | 7M | 141K | | | | | | | | | | | | | |
| 69 | TABLE ACCESS STORAGE FULL | UCM_CUSTOMER_DIM | 7M | 141K | | | | | | | | | | | | | |
| 70 | PX RECEIVE | | 11M | 18776 | | | | | | | | | | | | | |
| 71 | PX SEND HASH | :TQ20003 | 11M | 18776 | | | | | | | | | | | | | |
| 72 | PX BLOCK ITERATOR | | 11M | 18776 | | | | | | | | | | | | | |
| 73 | TABLE ACCESS STORAGE FULL | BILLING_ACCOUNT_DIM | 11M | 18776 | | | | | | | | | | | | | |
| 74 | PX BLOCK ITERATOR | | 10M | 25360 | | | | | | | | | | | | | |
| 75 | TABLE ACCESS STORAGE FULL | CLI_DIM | 10M | 25360 | | | | | | | | | | | | | |
| -> 76 | VIEW | | 642K | 217 | 4168 | +24 | 627K | 627K | | | | | | | | | |
| -> 77 | TABLE ACCESS STORAGE FULL | SYS_TEMP_0FD9FD04F_8289D6BF | 642K | 217 | 4168 | +24 | 627K | 4G | | | | | | | | | |
==========================================================================================================================================================================================================================================================
Listing 4: SQL Monitoring results for the problematic query.
---------------------------------------------------------------------------------------------------------------------So, it is quite easy to understand the problem of this execution plan is this extremely inefficient filter operation (which executes in a nested loop fashion). See how easy it was, when you use the right tools?
Can we have the same results with ASH? Oh yes we can!
Here is a little script that identifies the plan operation that runs for the largest percent of wall-clock time (the script gives you an option to choose sorting of plan operations by wall-clock time descending or by DB time descending. With the former you can identify the most time consuming step while with the latter you can identify the most resource consuming step of the plan):
nkarag@DWHPRD> host cat ash_ops.sql
set pagesize 999
set lines 999
def how_to_sort = "W"
accept how_to_sort prompt "Do you want operations sorted by DBtime desc (D) or Wall Clock time desc (W) - default is W:"
col sort_by_dbtime noprint new_value _SORT_BY_DBTIME
col sort_by_wctime noprint new_value _SORT_BY_WCTIME
set timing off
select decode(upper(nvl('&&how_to_sort','W')),'D','','--') sort_by_dbtime,
decode(upper(nvl('&&how_to_sort','W')),'W','','--') sort_by_wctime
from dual;
set timing on
select SQL_EXEC_START, SQL_PLAN_LINE_ID, SQL_PLAN_OPERATION, SQL_PLAN_OPTIONS, owner, object_name, object_type,
count(*) "DB Time (secs)",
count(distinct sample_time) "WC Time (secs)",
db_time_secs "DB Time Total (secs)",
wc_time_secs "WC Time Total (secs)",
round(count(*)/db_time_secs *100) "DB Time (%)",
round(count(distinct sample_time)/wc_time_secs *100) "WC Time (%)"
from (
select SQL_EXEC_START, sample_time, SQL_PLAN_LINE_ID, A.SQL_PLAN_OPERATION, SQL_PLAN_OPTIONS, owner, object_name, object_type, --count(*) over() cnttot,
count(*) over(partition by SQL_EXEC_START) db_time_secs,
count(distinct sample_time) over(partition by SQL_EXEC_START) wc_time_secs
from gv$active_session_history a left outer join dba_objects b on(a.CURRENT_OBJ# = b.object_id)
where
SAMPLE_TIME > sysdate - (&minutes_from_now/(24*60))
and sql_id = nvl('&sql_id',sql_id) and SQL_CHILD_NUMBER = nvl('&SQL_CHILD_NUMBER',0)
and sql_exec_id is not null
)
group by SQL_EXEC_START, db_time_secs, wc_time_secs, SQL_PLAN_LINE_ID, SQL_PLAN_OPERATION, SQL_PLAN_OPTIONS, owner, object_name, object_type --, cnttot
order by SQL_EXEC_START desc,
&_SORT_BY_DBTIME "DB Time (%)" desc
&_SORT_BY_WCTIME "WC Time (%)" desc
/
Listing 5: A script for returning the plan operation of a running query that consumed the most wall-clock time based on ASH.
---------------------------------------------------------------------------------------------------------------------
SQL_EXEC_START SQL_PLAN_LINE_ID SQL_PLAN_OPERATION SQL_PLAN_OPTIONS OWNER OBJECT_NAME OBJECT_TYPE DB Time (secs) WC Time (secs) DB Time Total (secs) WC Time Total (secs) DB Time (%) WC Time (%)
------------------- ---------------- ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------- -------------- -------------- -------------------- -------------------- ----------- -----------
27/01/2015 16:17:41 43 VIEW 2595 2595 3592 3592 72 72
27/01/2015 16:17:41 44 TABLE ACCESS STORAGE FULL 989 989 3592 3592 28 28
27/01/2015 16:17:41 13 FILTER 5 5 3592 3592 0 0
27/01/2015 16:17:41 14 PX COORDINATOR 2 2 3592 3592 0 0
27/01/2015 16:17:41 11 HASH UNIQUE 1 1 3592 3592 0 0
Listing 6: Results of ASH for the problematic query.
---------------------------------------------------------------------------------------------------------------------An even better approach would be to first run the ASH script in order to identify the most consuming operation and then, run the SQL monitoring script to get an overall view of the plan and get the more accurate execution time metrics, execution counts of each step etc. Imagine that in a very long plan, it is difficult for the human eye to catch the largest values in column "Time Active(s)".
So clearly SQL monitoring ans ASH have helped us easily and quickly identify the problem. Now, of course we have to find a fix.
How to fix the problem
In the previous section we showed how easy it is to identify the "bottleneck operations" of a running query with the help of SQL monitoring and/or ASH in version 11g. Once you identify the problem, then you have only traveled half-way because you also have to fix it. The fix of course depends by the specific case at hand (see some brief categorization of problems here).
For our running example the problem lies in the FILTER operation (operation 13 in Listing 4). The problematic full table scan (operation 44) is a child operation of this filter. The FILTER operation with two child sources is a clear indicator that the optimizer did not manage to unnest a subquery. After some careful examination of the SQL statement, we easily managed to find the pain point! It was an IN subquery after an OR condition. Something with the following form:
SELECT
FROM
WHERE
<condition1>
AND <condition2>
AND (
<confition3>
OR <condition 4>
OR <condition 5>
OR <values> IN (subquery)
)
Once we removed the bold part above from the problematic statement the query finished in 35 seconds! Actually as it turned out the this condition was redundant and could be discarded all together.
Summary
In this post we have shown a real-life example of how to identify efficiently the most time consuming operation in the execution plan of a running query that does not finish. We have used two great 11g features, namely:
- Real-time SQL Monitoring
- Active Session History (essentially the new columns added in 11g that reveal the operation running)
A similar example with the use of ASH we have posted in the past here.
If you have to tune queries as your daily job then you must learn to use SQL monitoring and ASH really good. They are great life saviors!
Nice Post. I will surely recommend it to my friends. Warehouses Chicago
ReplyDeleteThank you Julia!
ReplyDelete