The Problem
SQL performance degradation due to a change of plan is one problem, fixing the problem fast so as to allow production ETL flows to unstack is another one, ensuring stability for the execution plan of a statement is yet another one! In a previous post, we discussed a method for detecting and verifying a change of plan. So we assume that we have verified the change of plan and we have also identified some older plan as the "good plan" that we would like to revert to.In this post, we would like to deal with the case of how we can quickly revert to an older plan, (one that hopefully lies in the cursor cache or/and in the AWR repository) and thus fix the production problem efficiently and in a limited time-window. To this end, we will use the "SQL Plan Management" features of Oracle 11g, which apart from solving the above problem, will also ensure the required stability of the execution plan.
SQL Plan Management in the broader sense, comprises the methods Oracle offers for controlling the execution plan(s) of a SQL statement. In this case, we want a method for forcing a specific execution plan, (the one that we have identified as the "good plan") for the SQL statement with the performance problem and guarantee that the optimizer will not divert from the plan in future executions.
There are several ways with which you can force a specific plan in Oracle and guarantee plan stability. From 11g and onwards the recommended method for achieving this is called: SQL Plan Management with the use of SQL Plan Baselines. For more detailed info about SQL Plan Baselines read this previous post.
Please note that prior to SQL Plan Baselines (e.g. in versions before 11g) one could achieve the same result with one for the following:
- A special type of SQL Profiles, as described in this previous post.
- Stored Outlines (deprecated in Oracle 11g). See here or here.
The Steps of How to Force a Plan with an SQL Plan Baseline
STEP 1: Identify the sql id of the statement in questionAs trivial as it might sound, the very first step is to get the sql id of the problematic statement. We have described how you do this in this post.
STEP 2: Identify the target plan and where is located (Cursor Cache or AWR)
Once we have the sql id, then the next step is to identify which is the older "good plan" that we would like to revert to. To do this, we have to search the execution history for this specific sql id. The details of how you do this are in this post. Also, the next steps depend on whether the target plan is loaded in the cursor cache (i.e., the Library Cache in the Shared Pool). If it is, then we simply have to load the plan into an SQL Plan Baseline and we are over. Otherwise, the target plan must be loaded in AWR and the process is a bit more complex (see next). If it is in neither place, then there is no target plan to revert to and we can forget about it altogether! :-)
STEP 3: If the target plan is in the Cursor Cache then load it into an SQL Plan Baseline and finish
Now, if the target plan is loaded in the cursor cache then we only have to load it into an SQL Plan Baseline. This takes place with a simple call to function DBMS_SPM.LOAD_PLANS_FROM_CURSOR_CACHE, like this:
declare
i pls_integer;
begin
i:= DBMS_SPM.LOAD_PLANS_FROM_CURSOR_CACHE (
sql_id => '&sql_id',
plan_hash_value => &phv);
end;
That' s all there is to it. This call makes the target plan an "accepted plan" and thus the next time this sql id appears the optimizer will go with this plan.
STEP 4: If the target plan is in the AWR then identify the Begin and End AWR snapshot ids
If the plan is in the AWR, then first we have to identify the begin and end AWR snapshot ids of the AWR snapshots that have stored the target plan. This is easy by querying the view DBA_HIST_SQLSTAT (also you can use the script fs_awr.sql described in this post). At a bare minimum the query will be something like this:
SELECT snap_id
FROM dba_hist_sqlstat
WHERE
sql_id = '&sql_id' AND plan_hash_value = &phv
ORDER BY 1
STEP 5: (Target Plan in AWR) Load plan into an SQL Tuning Set (STS)
Once we have the begin and end snapshot ids, we have to load the plan into an SQL Tuning Set, because the SQL Plan Baseline will be loaded with our plan via this STS. Before we load the target plan into an STS, we first have to create one. For this we have to use a call like this:
-- create the STS
BEGIN
DBMS_SQLTUNE.CREATE_SQLSET(
sqlset_name => 'nikos_sql_tuning_set',
description => 'test loading a plan from AWR into an STS',
sqlset_owner => 'NIKOS');
END;
Now, that the STS is created we just have to loaded it with our target plan. This can be achieved with the following piece of code:
-- load the STS from AWR (you need to specify the begin and end snaphsots)
DECLARE
mycursor DBMS_SQLTUNE.SQLSET_CURSOR;
BEGIN
OPEN mycursor FOR
SELECT VALUE(p)
FROM TABLE (DBMS_SQLTUNE.SELECT_WORKLOAD_REPOSITORY (
begin_snap => 850,
end_snap => 851,
basic_filter => 'sql_id = ''&sql_id'' AND plan_hash_value = &phv')) p;
DBMS_SQLTUNE.LOAD_SQLSET(
sqlset_name => 'nikos_sql_tuning_set',
populate_cursor => mycursor);
END;
Note that we have to pass as input the begin and end snapshot ids as well as filtering information so as to get the specific plan for the specific sql id.
STEP 6: (Target Plan in AWR) Load plan into a SQL Plan Baseline from the STS
As the final step we have to create an SQL Plan Baseline and load it with the target plan from the STS. This takes place with a call like the following:
declare
i pls_integer;
begin
i := DBMS_SPM.LOAD_PLANS_FROM_SQLSET (
sqlset_name => 'nikos_sql_tuning_set',
sqlset_owner => 'NIKOS');
end;
And that is all! Now we have an SQL Plan Baseline loaded with our target plan as an "accepted" plan and whenever Oracle tries to execute the specific statement will choose this plan. We can verify that our plan is loaded in a SQL Plan Baseline by a simple query on DBA_SQL_PLAN_BASELINES view (or *_SQL_PLAN_BASELINES in general).
This way, we can very fast fix the problem with the problematic statement and allow the ETL flows to continue execution, so as to meet the business need. It goes without saying, that this is a nice and fast solution but does not gives us an excuse for omitting to find the root-cause for the change of plan. Now that we have got the management pressure off our back, we have all the required time to investigate the problem, as a true professional that we are.
Next we will give a full example of the above process.
An Example
Lets see an example of forcing a plan with an SQL Plan Baseline, So lets say we have a query that corresponds to a specific sql id and is executed with a specific execution plan that performance-wise makes us very happy. It is shown in Listing 1.
SELECT ch.channel_class, c.cust_city, t.calendar_quarter_desc, SUM(s.amount_sold) sales_amount
FROM sh.sales s, sh.times t, sh.customers c, sh.channels ch
WHERE s.time_id = t.time_id
AND s.cust_id = c.cust_id
AND s.channel_id = ch.channel_id
AND c.cust_state_province = 'CA'
AND ch.channel_desc in ('Internet','Catalog')
AND t.calendar_quarter_desc IN ('1999-01','1999-02')
GROUP BY ch.channel_class, c.cust_city, t.calendar_quarter_desc
sql_id ==> frxcws72k2zt2
Plan hash value: 3426328104
----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | Cost (%CPU)| Pstart| Pstop | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 1315 (100)| | | | | |
| 1 | HASH GROUP BY | | 22 | 1315 (69)| | | 744K| 744K| 1131K (0)|
|* 2 | HASH JOIN | | 15M| 491 (17)| | | 1001K| 1001K| 1082K (0)|
|* 3 | HASH JOIN | | 5403 | 412 (1)| | | 862K| 862K| 1288K (0)|
|* 4 | TABLE ACCESS FULL | CUSTOMERS | 3249 | 406 (1)| | | | | |
|* 5 | HASH JOIN | | 11738 | 6 (34)| | | 855K| 855K| 1225K (0)|
| 6 | PART JOIN FILTER CREATE | :BF0000 | 181 | 2 (0)| | | | | |
|* 7 | TABLE ACCESS FULL | TIMES | 181 | 2 (0)| | | | | |
| 8 | PARTITION RANGE JOIN-FILTER| | 118K| 3 (34)|:BF0000|:BF0000| | | |
| 9 | TABLE ACCESS FULL | SALES | 118K| 3 (34)|:BF0000|:BF0000| | | |
|* 10 | TABLE ACCESS FULL | CHANNELS | 14296 | 2 (0)| | | | | |
----------------------------------------------------------------------------------------------------------------------
Listing 1: An example query and its execution plan.
One day though, something happened and the optimizer decided to go astray and choose a different plan. One that caused us a "management headache"!
In order, to invoke the change of plan for our running example, we will tamper with the statistics of the CUSTOMERS table and make it look as a cute small table:
In order, to invoke the change of plan for our running example, we will tamper with the statistics of the CUSTOMERS table and make it look as a cute small table:
sys@NIKOSDB> exec dbms_stats.set_table_stats('SH', 'CUSTOMERS', numrows=>1, numblks=>1)
PL/SQL procedure successfully completed.
Listing 2: Set the statistics of CUSTOMERS to invoke a change of plan.
Plan hash value: 593420798
----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | Cost (%CPU)| Pstart| Pstop | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 503 (100)| | | | | |
| 1 | HASH GROUP BY | | 22 | 503 (4)| | | 744K| 744K| 1147K (0)|
|* 2 | HASH JOIN | | 136K| 497 (3)| | | 888K| 888K| 382K (0)|
|* 3 | TABLE ACCESS FULL | CHANNELS | 964 | 2 (0)| | | | | |
|* 4 | HASH JOIN | | 707 | 494 (2)| | | 855K| 855K| 1266K (0)|
| 5 | PART JOIN FILTER CREATE | :BF0000 | 181 | 2 (0)| | | | | |
|* 6 | TABLE ACCESS FULL | TIMES | 181 | 2 (0)| | | | | |
|* 7 | HASH JOIN | | 7129 | 491 (2)| | | 862K| 862K| 1289K (0)|
|* 8 | TABLE ACCESS FULL | CUSTOMERS | 3341 | 2 (0)| | | | | |
| 9 | PARTITION RANGE JOIN-FILTER| | 118K| 488 (2)|:BF0000|:BF0000| | | |
| 10 | TABLE ACCESS FULL | SALES | 118K| 488 (2)|:BF0000|:BF0000| | | |
----------------------------------------------------------------------------------------------------------------------
Listing 3: The new plan for our query. This will play the role of the "bad" plan.We have a specific query. This query has an sql id and this is "frxcws72k2zt2". Our query used to run efficiently with plan 3426328104 and one day the plan changed and plan 593420798 appeared. So we have the current plan (593420798) which is causing the trouble (we will call it the "bad plan") and an older plan that we would like to revert to (3426328104, i.e., the "good plan").
Now, if we check the execution plan history for this sql id (as we have shown in this post ), then we can see the bad plan stored in the cursor cache, and the good plan stored in AWR. See Listing 4 below,
sys@NIKOSDB> @fs_plans
********************************************************
Find the available execution plans for a specific SQL_ID
********************************************************
Session altered.
Elapsed: 00:00:00.01
Enter value for sql_id: frxcws72k2zt2
SOURCE INST_ID SQL_ID PLAN_HASH_VALUE EXECUTIONS AVG_ET_SECS AVG_PX COST TIMESTAMP PARSING_SCHEMA_NAME
-------------------- ---------- --------------- --------------- --------------- ----------- ------- ----------- ------------------------- ----------------
gv$sqlarea_plan_hash 1 frxcws72k2zt2 593420798 23 .00 0 503 28-02-2015 16:27:50 SYS
dba_hist_sql_plan frxcws72k2zt2 3426328104 18 .00 0 1315 28-02-2015 14:35:08
dba_hist_sql_plan frxcws72k2zt2 593420798 8 .01 0 503 28-02-2015 15:09:56
dba_hist_sql_plan frxcws72k2zt2 1018976501 1 .35 0 40 28-02-2015 14:32:57
gv$sqlarea_plan_hash 1 frxcws72k2zt2 1720392155 2 1.14 0 12 28-02-2015 16:26:39 SYS
dba_hist_sql_plan frxcws72k2zt2 769839257 2 1.39 0 5 28-02-2015 14:33:59
dba_hist_sql_plan frxcws72k2zt2 1720392155 1 2.43 0 12 28-02-2015 15:06:21
gv$sqlarea_plan_hash 1 frxcws72k2zt2 769839257 2 2.79 0 5 28-02-2015 14:33:59 SYS
8 rows selected.
Listing 4:The execution plan history of our query. See the bad plan (in red) stored in the cursor cache (exposed via view gv$sqlarea_plan_hash) and the good plan (in green) stored in the AWR (exposed via view dba_hist_sql_plan).
Now we have a clear goal: we want to force the optimizer from now on to use the good plan instead of the bad plan. To this end we will use SQL Plan Baselines. In Listing 5, we call a script (sqltune_fplan.sql) that comprises all the aforementioned steps in order to achieve this goal and we depict the output of this script.
sys@NIKOSDB> @sqltune_fplan
Wrote file sqlplus_settings
Parameter 1:
SQL_ID (required)
Enter value for 1: frxcws72k2zt2
Parameter 2:
PLAN_HASH_VALUE (required)
Enter value for 2: 3426328104
Parameter 3:
Specify the number of days back from SYSDATE for which you want to search the AWR (default 10):
Parameter 4:
Loaded plan is used as a fixed plan Y/N (default N):
***INFO***: The requested plan was NOT found in cursor cache (GV$SQL).
***INFO***: The requested plan must be searched in AWR (DBA_HIST_SQLSTAT)
***INFO***: The requested plan was found in AWR.
***INFO***: The requested plan has been loaded into the SQL Plan Baseline. Lets check DBA_SQL_PLAN_BASELINES ...
SQL_ID SIGNATURE SQLH PN ENABLED ACCEPTED FIXED DESCRIPTION CREATOR ORIGIN PARSING_SCHEMA_NAME SQL_TEXT
--------------- ---------- ------------------------------ ------------------------------ ---------- ---------- ---------- ---------------------------------------------------------------------------------------------------- -------------------- -------------------- ------------------------------ --------------------------------------------------------------------------------------------------------------------
frxcws72k2zt2 1.2622E+19 SYS_SQL_af2aba07d70e92d8 SQL_PLAN_ayapu0zbhx4qsaf4f8cb0 YES YES NO SYS MANUAL-LOAD SYS SELECT ch.channel_class, c.cust_city, t.calendar_quarter_desc, SUM(s.amount_sold) sales_amount
*** And the plan is the following ...
(note that the plan hash value might be different)
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------
SQL handle: SYS_SQL_af2aba07d70e92d8
SQL text: SELECT ch.channel_class, c.cust_city, t.calendar_quarter_desc,
SUM(s.amount_sold) sales_amount FROM sh.sales s, sh.times t,
sh.customers c, sh.channels ch WHERE s.time_id = t.time_id AND
s.cust_id = c.cust_id AND s.channel_id = ch.channel_id AND
c.cust_state_province = 'CA' AND ch.channel_desc in
('Internet','Catalog') AND t.calendar_quarter_desc IN
('1999-01','1999-02') GROUP BY ch.channel_class, c.cust_city,
t.calendar_quarter_desc
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Plan name: SQL_PLAN_ayapu0zbhx4qsaf4f8cb0 Plan id: 2941226160
Enabled: YES Fixed: NO Accepted: YES Origin: MANUAL-LOAD
--------------------------------------------------------------------------------
Plan hash value: 3426328104
-------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 84 | 507 (4)| 00:00:07 | | |
| 1 | HASH GROUP BY | | 1 | 84 | 507 (4)| 00:00:07 | | |
|* 2 | HASH JOIN | | 1 | 84 | 506 (4)| 00:00:07 | | |
|* 3 | HASH JOIN | | 1 | 63 | 503 (4)| 00:00:07 | | |
|* 4 | TABLE ACCESS FULL | CUSTOMERS | 1 | 26 | 2 (0)| 00:00:01 | | |
|* 5 | HASH JOIN | | 62 | 2294 | 501 (4)| 00:00:07 | | |
| 6 | PART JOIN FILTER CREATE | :BF0000 | 1 | 16 | 2 (0)| 00:00:01 | | |
|* 7 | TABLE ACCESS FULL | TIMES | 1 | 16 | 2 (0)| 00:00:01 | | |
| 8 | PARTITION RANGE JOIN-FILTER| | 918K| 18M| 494 (3)| 00:00:06 |:BF0000|:BF0000|
| 9 | TABLE ACCESS FULL | SALES | 918K| 18M| 494 (3)| 00:00:06 |:BF0000|:BF0000|
|* 10 | TABLE ACCESS FULL | CHANNELS | 1 | 21 | 2 (0)| 00:00:01 | | |
-------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("S"."CHANNEL_ID"="CH"."CHANNEL_ID")
3 - access("S"."CUST_ID"="C"."CUST_ID")
4 - filter("C"."CUST_STATE_PROVINCE"='CA')
5 - access("S"."TIME_ID"="T"."TIME_ID")
7 - filter("T"."CALENDAR_QUARTER_DESC"='1999-01' OR "T"."CALENDAR_QUARTER_DESC"='1999-02')
10 - filter("CH"."CHANNEL_DESC"='Catalog' OR "CH"."CHANNEL_DESC"='Internet')
Listing 5:
Output of the script sqltune_fplan.sql which forces a plan for a sql id with the use of SQL Plan Baselines.We can recognize our "good plan" not only from the plan hash value (3426328104 in our case - note: that the plan hash value of a plan in an SQL Plan Baseline might be different but that does not mean that it is a different plan, it is just that the hash value is computed differently) but also from the join order also, which returned to SALES-->TIMES-->CUSTOMERS-->CHANNELS.
If we attempt to rerun the query and retrieve the execution plan with a call to DBMS_XPLAN.DISPLAY_CURSOR, then we can see also a note at the end saying that our plan from the SQL Plan Baseline was chosen. See this in Listing 6 below.
Plan hash value: 3426328104
----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | Cost (%CPU)| Pstart| Pstop | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 507 (100)| | | | | |
| 1 | HASH GROUP BY | | 1 | 507 (4)| | | 744K| 744K| 1134K (0)|
|* 2 | HASH JOIN | | 1 | 506 (4)| | | 1001K| 1001K| 1075K (0)|
|* 3 | HASH JOIN | | 1 | 503 (4)| | | 862K| 862K| 1294K (0)|
|* 4 | TABLE ACCESS FULL | CUSTOMERS | 1 | 2 (0)| | | | | |
|* 5 | HASH JOIN | | 62 | 501 (4)| | | 855K| 855K| 1198K (0)|
| 6 | PART JOIN FILTER CREATE | :BF0000 | 1 | 2 (0)| | | | | |
|* 7 | TABLE ACCESS FULL | TIMES | 1 | 2 (0)| | | | | |
| 8 | PARTITION RANGE JOIN-FILTER| | 918K| 494 (3)|:BF0000|:BF0000| | | |
| 9 | TABLE ACCESS FULL | SALES | 918K| 494 (3)|:BF0000|:BF0000| | | |
|* 10 | TABLE ACCESS FULL | CHANNELS | 1 | 2 (0)| | | | | |
----------------------------------------------------------------------------------------------------------------------
Note
-----
- SQL plan baseline SQL_PLAN_ayapu0zbhx4qsaf4f8cb0 used for this statement
- Warning: basic plan statistics not available. These are only collected when:
* hint 'gather_plan_statistics' is used for the statement or
* parameter 'statistics_level' is set to 'ALL', at session or system level
Listing 6:Verify that the "good plan" from the SQL Plan Baseline is used.1. to force the good plan for our query fast and easy.
2. to guarantee plan stability for our query.
The Script
The script sqltune_fplan.sql for forcing a plan for a specific sql id that we have used in the above example can be found in my GitHub repository, here. In addition, the script fs_plans.sql for showing all the available execution plans for a specific sql id that we have used above can be found here.
Summary
In this post we have shown a method for quickly dealing with a sudden change of the execution plan of an sql statement. Changes in the execution plan can cause serious performance problems to your daily ETL flows, or to the refreshing of your daily standard reports. When this type of problem shows up, you need a way to deal with it fast, or your SLA with your customers might be at risk. The method that we have shown here is to revert to an older better plan with the use of SQL Plan Baselines. With this method we have managed to force a specific plan, as well as guarantee plan stability for our query.Please note, that this method is just a quick remedy for the problem in order to troubleshoot the incident in your production DW as fast as possible. You must do your homework and at a later time investigate the root cause for the change of plan and then decide which would be the most appropriate permanent solution.
In order to better understand the 11g SQL Plan Management features and the various options with SQL Plan Baselines, we suggest that you read the material in this post.
Hi Nikos -
ReplyDeleteGood compilation with examples.
I am looking out for sqltune_fplan.sql and GitHub repository link no more exists.
Would you share the SQL please
Thank you!
Sree
Hi Sree,
Deletethank you for your comments. I have updated the links in the post. The new link is:
https://github.com/nkarag/oracle_ready2run_scripts/blob/master/sqltune_fplan.sql
cheers,
Nikos
Good Morning Nikos.
ReplyDeleteThank you!..Book marked.
Regards
Sree