Oracle Data Warehousing Stories: 2015

Friday, November 13, 2015

Oracle SQL Tuning for Day-to-Day Data Warehouse Support

How many times have you been called for a Data Warehouse running task that is taking too long? The underlying SQL statement has been running for some hours now and the SLA to the business is under risk! Does it sound familiar?

If you have ever worked as a DBA, Support Engineer, or even a Developer for a production data warehouse then the above situation must sound quite familiar. Actually in real-world large data warehouses solving SQL performance problems is a daily task. A task that quite often we must perform during "anti-social" hours (e.g., after midnight or very early in the morning, or during weekends etc.).

Right-Deep Join Trees and Star Schema Queries

There are many trees out there, but what is your favorite join-tree?

Join trees are a graphical representation of the way a series of joins (an N-way join as it is called scientifically) is executed. The most common form of join trees are the left-deep join trees. Actually, these are the only ones that the CBO (i.e., the Oracle Cost Based Optimizer) considers when it chooses a join method other than a Hash Join (i.e., Nested Loops, Merge Join and Cartesian Joins). For Hash Joins in particular, the CBO also considers right-deep join trees from 11g and onward.

In this post, we will describe why right-deep join trees are important for the execution of star schema queries, which are the bread and butter of Data Warehouses. We will show that right-deep join trees make a more efficient use of the workareas (i.e., the memory area used in the PGA, when a hash join, group by, sorting etc operation is executed) during the execution of a series of hash-joins included in a typical star schema query.

Recursive Subquery Factoring and how to "de-LISTAGG" a list of strings

Recursive Subquery Factoring and how to De-listagg a list of strings

Have you ever used "recursive subquery factoring"? It is a fantastic new feature available since 11g release 2 and basically is subquery factoring with ... well, recursion! :-)

If you have never heard before about "subquery factoring", then maybe you have used a "with subquery", which is the same thing. If you haven't ever used a "with subquery" then you should probably read some relevant infο first before proceeding to this post, e.g., this post.

Many say that recursive subquery factoring is the replacement of "connect by" queries (aka "hierarchical queries"), since it is included in the SQL ANSI standard and indeed one can implement any type of hierarchical query with it. However, recursive factored subqueries are much more powerful and can be used in a much broader range of use cases.

In this post, we will give a couple of simple examples of recursive subquery factoring (RSF) and then use it in order to solve an interesting problem (one that is not a "hierarchical query" problem). The problem is how to "undo" a LISTAGG operation.

In other words, we will show how we can use recursive subquery factoring in order to turn a list of string elements into a set of individual rows, where each row stores a single string element, This solution, because of the use of recursion is really elegant!

Conditional Execution and Conditional PL/SQL Compilation in SQL*Plus

Scripting with SQL*Plus is a must for any decent Oracle professional! Usually for simple day-to-day tasks, but quite often for even more complex tasks, we have to implement our "program logic" with SQL*Plus commands, SQL statements and anonymous PL/SQL blocks, all embedded within an SQL*Plus script.

Quite often the need arises to include some "conditional execution" logic in our script. I.e., we want to execute only certain parts of the script according to some condition. This is also called "branching" and is analogous to an IF-THEN-ELSE construct in a programming language. The condition to be evaluated can change dynamically each time that our script is executed (e.g., because it is based on user input).

Moreover, there comes the time where for an anonymous PL/SQL block, embedded in a SQL*Plus script, you want to direct the compiler to omit a specific part of PL/SQL code, based on some condition (e.g., the Oracle version where the script runs, or some user-based input, etc.). This is called "conditional compilation" and is different than conditional execution.

In a programming language (as is PL/SQL) there exist conditional execution constructs (e.g., IF-THEN-ELSE), so conditional execution is not an issue. What we want to achieve with conditional compilation is to make a part of PL/SQL code "invisible" to the compiler each time we run our script, based on some dynamic condition. For example we might want to avoid a compilation error because of the Oracle version that our script runs on, so we have to "hide" different parts of the code based on the Oracle version.

In traditional PL/SQL, this is achieved with conditional compilation directives, statements, and identifiers prefixed by a dollar sign ($) (read this great article for details how this is done in PL/SQL). The question is how can you do it for anonymous PL/SQL within a SQL*Plus script, where compilation takes place "on-the-fly"?

In this post, we will show how one can achieve both conditional execution and conditional PL/SQL compilation in an SQL*Plus script.

Join hints and join ordering, or, why is the optimizer ignoring my join hint?

Have you ever tried to use a join hint (e.g., USE_NL, or USE_HASH) in a query, only to realize that the query optimizer stubbornly refuses to obey?

No matter how frustrating this might be for you, there is an explanation for this behavior. One of the main reasons for such an event is the following rule:

Oracle first decides on the join order and then on the join method to use.

In this post, we will describe how join ordering can be the reason for the optimizer to ignore a hint for forcing a specific join method (e.g., nested loops, hash join, sort merge join etc.) Moreover, we will show how one can use the USE_NL or USE_HASH hints (or any hint that forces a specific join method) in combination with the LEADING (or ORDERED) hint (i.e., hints that force a specific join order).

Using SQL Plan Baselines to Revert to an Older Plan

Using SQL Plan Baselines to Revert to an Older Plan It's 10:00 AM and your nightly ETL flows are still running! Τhe daily reports must have been sent to the management from 8:00! From you investigation you find out that a specific statement (e.g., one that corresponds to an ETL mapping) is taking much too long than usual (it used to run in minutes and now it is running for 4 hours!). Your first thought is that some change in the execution plan must have taken place. How do you fix this? Or better, how do you fix this fast? (remember the management is waiting for the report)?

The Problem

SQL performance degradation due to a change of plan is one problem, fixing the problem fast so as to allow production ETL flows to unstack is another one, ensuring stability for the execution plan of a statement is yet another one! In a previous post, we discussed a method for detecting and verifying a change of plan. So we assume that we have verified the change of plan and we have also identified some older plan as the "good plan" that we would like to revert to.

In this post, we would like to deal with the case of how we can quickly revert to an older plan, (one that hopefully lies in the cursor cache or/and in the AWR repository) and thus fix the production problem efficiently and in a limited time-window. To this end, we will use the "SQL Plan Management" features of Oracle 11g, which apart from solving the above problem, will also ensure the required stability of the execution plan.

Parallel Refresh of a Materialized View

This is a quick post regarding materialized views refresh.

Refresh is invoked with a call to procedure DBMS_MVIEW.REFRESH. For large MVs it is highly desired that the refresh takes place in parallel. The REFRESH procedure parameter "parallelism" makes you think that this is the right tool for this job. Nope!

As much attractive as it might seen, this parameter does not help towards the parallelism of the refresh. You have to either include a parallel hint in the SELECT part of the definition statement of the MV (create materialized view) or/and include a PARALLEL DEGREE > 1 to the base tables of the MV.

Detecting a change in the execution plan of a query

My report used to complete in seconds and now is running for hours! Do you know what is going on?

Sounds familiar? It is very common in Data Warehouses to experience a sudden performance degradation in the execution time of a report, or of an individual ETL mapping. The query was running fast for months and then one day everything changed!

Such a behavior is a sign of a change in the execution plan. Of course there are a million other reasons that might have caused the performance degradation but a change in the execution plan is one of the most common. The reasons for such a change are numerous (stale statistics, change triggered by some DDL -e.g. a new index created, a change in the optimizer environment -e.g., by changing the value of some relevant parameter at the session level etc.)

In this post, we want to show how easy it is to detect a change in the execution plan of a statement (technically represented by a unique sql id) with the help of historic data maintained by Oracle in the AWR. To this end we will present some examples and some simple scripts to do it. However, we will not deal at all with the reasons that might trigger the change of plan. This is a long discussion ...

How to effectively tune a query that does not even finish (SQL Monitoring and ASH in action)

Have you ever been called to tune a query that runs forever?

Even worse, what if the query is very complex, you don't know anything about the "logic" behind the SQL - since it was written by someone else - and when you see the execution plan it is several pages long?
Moreover, what if the query is for retrieving a list of customers for a business critical campaign that has to run asap, and therefore the beloved management is over your shoulder, waiting for you to solve the problem!

Problem Definition

Never ending queries, or at least queries that cannot finish within an acceptable time frame are quite common in large Data Warehouses. Unfortunately, in this situation one cannot use methods where execution statistics are gathered first and then analysis of the execution steps can take place. Simply because the query does not finish!

Oracle Data Warehousing Stories

Friday, November 13, 2015

Oracle SQL Tuning for Day-to-Day Data Warehouse Support

Monday, July 27, 2015

Right-Deep Join Trees and Star Schema Queries

Sunday, June 7, 2015

Recursive Subquery Factoring and how to "de-LISTAGG" a list of strings

Friday, May 15, 2015

Conditional Execution and Conditional PL/SQL Compilation in SQL*Plus

Friday, March 13, 2015

Join hints and join ordering, or, why is the optimizer ignoring my join hint?

Sunday, March 1, 2015

Using SQL Plan Baselines to Revert to an Older Plan

The Problem

Wednesday, February 11, 2015

Parallel Refresh of a Materialized View

Tuesday, February 10, 2015

Detecting a change in the execution plan of a query

Wednesday, January 28, 2015

How to effectively tune a query that does not even finish (SQL Monitoring and ASH in action)

Problem Definition