Monday, April 18, 2016

A Practical Guide to SQL Tuning for Day-to-Day Data Warehouse Support

A Practical Guide to SQL Tuning for Day-to-Day Data Warehouse Suppport
In real-world large DWs solving SQL performance problems is a daily task. A task that quite often must be performed during "anti-social" hours (i.e., too late at night, or too early in the morning, or during weekends) and under significant time pressure.

In the following article, we provide a practical guide to SQL tuning for day-to-day DW support. We describe the most popular methods that DW engineers use today, in order to solve SQL performance problems, based on our real-world experience. We point out pros and cons of each method, and provide some best-practice recommendations.

The intended audience is DW support engineers, DW developers and DBAs. The methods presented are not DW-specific and pertain to SQL tuning in general, so this article should be useful to a broader audience of database professionals with interest in SQL tuning and not only to Data Warehouse professionals.

You can download the article from here.

Update:
Note that this article has been published in the following two well-known Oracle technical journals:
* ODTUG Technical Journal, May 6, 2016
* RMOUG Newsletter, Nov 2017

Friday, January 29, 2016

Efficient Distributed Queries for Oracle Data Warehouses

Efficient Distributed Queries for Oracle Data Warehouses
One of the most typical query patterns for a data warehouse is to “select” data from a remote site (i.e., a data source) and insert them into a local staging table. Also, in the majority of the cases, this statement will include some joins to local tables as well as some filter predicates on both local and remote tables. Let’s say that a typical query pattern, in its most basic form will be something like the following:

INSERT INTO <local staging table>
SELECT <columns of interest>
FROM   <remote_table>@<remote_site> t_remote
       <local_table> t_local
WHERE
       <join condition between t_remote and t_local>
       <filter predicates on t_local>

Very often, the table at the remote site has a significant number of rows. For example, a very common scenario is that the remote table is a large table containing transaction data (e.g., order lines, Call Detail Records, invoice lines etc.) with a timestamp. The goal of the distributed statement is to select only the rows that correspond to a specific time period (e.g., the transactions of the last day, a.k.a. “the daily delta”) and insert them into a local staging table. In order to achieve this, we maintain locally (i.e., at the data warehouse site) a very small “control table” that holds the current data warehouse “running date” (i.e., the date of interest for transaction data).

Therefore conceptually, we have a join of the very small (local) control table to the very large (remote) transactional table, and this join yields a relatively small amount of rows (i.e., the daily delta of transactional data, which is small, compared to the whole transactional table). Now, what is the most efficient method to execute this distributed query?

Friday, November 13, 2015

Oracle SQL Tuning for Day-to-Day Data Warehouse Support

Oracle SQL Tuning for Day-to-Day Data Warehouse Support
How many times have you been called for a Data Warehouse running task that is taking too long? The underlying SQL statement has been running for some hours now and the SLA to the business is under risk! Does it sound familiar?

If you have ever worked as a DBA, Support Engineer, or even a Developer for a production data warehouse then the above situation must sound quite familiar. Actually in real-world large data warehouses solving SQL performance problems is a daily task. A task that quite often we must perform during "anti-social" hours (e.g., after midnight or very early in the morning, or during weekends etc.).


Monday, July 27, 2015

Right-Deep Join Trees and Star Schema Queries

Right-Deep Join Trees and Star Schema Queries
There are many trees out there, but what is your favorite join-tree?

Join trees are a graphical representation of the way a series of joins (an N-way join as it is called scientifically) is executed. The most common form of join trees are the left-deep join trees. Actually, these are the only ones that the CBO (i.e., the Oracle  Cost Based Optimizer) considers when it chooses a join method other than a Hash Join (i.e., Nested Loops, Merge Join and Cartesian Joins). For Hash Joins in particular, the CBO also considers right-deep join trees from 11g and onward.

In this post, we will describe why right-deep join trees are important for the execution of star schema queries, which are the bread and butter of Data Warehouses. We will show that right-deep join trees make a more efficient use of the workareas (i.e., the memory area used in the PGA, when a hash join, group by, sorting etc operation is executed) during the execution of a series of hash-joins included in a typical star schema query.

Sunday, June 7, 2015

Recursive Subquery Factoring and how to "de-LISTAGG" a list of strings

Recursive Subquery Factoring and how to De-listagg a list of strings
Have you ever used "recursive subquery factoring"? It is a fantastic new feature available since 11g release 2 and basically is subquery factoring with ... well, recursion! :-)

If you have never heard before about  "subquery factoring", then maybe you have used a "with subquery", which is the same thing. If you haven't ever used a "with subquery" then you should probably  read some relevant infο first before proceeding to this post, e.g., this post.

Many say that recursive subquery factoring is the replacement of "connect by" queries (aka "hierarchical queries"), since it is included in the SQL ANSI standard and indeed one can implement any type of hierarchical query with it. However, recursive factored subqueries are much more powerful and can be used in a much broader range of use cases.

In this post, we will give a couple of simple examples of recursive subquery factoring (RSF) and then use it in order to solve an interesting problem (one that is not a "hierarchical query" problem). The problem is how to "undo" a LISTAGG operation.

In other words, we will show how we can use recursive subquery factoring in order to turn a list of string elements into a set of individual rows, where each row stores a single string element, This solution, because of the use of recursion is really elegant!

Friday, May 15, 2015

Conditional Execution and Conditional PL/SQL Compilation in SQL*Plus

Conditional Execution and Conditional PL/SQL Compilation in SQL*Plus
Scripting with SQL*Plus is a must for any decent Oracle professional! Usually for simple day-to-day tasks, but quite often for even  more complex tasks, we have to implement our "program logic" with SQL*Plus commands, SQL statements and anonymous PL/SQL blocks, all embedded within an SQL*Plus script.

Quite often the need arises to include some "conditional execution" logic in our script. I.e., we want to execute only certain parts of the script according to some condition. This is also called "branching" and is analogous to an IF-THEN-ELSE construct in a programming language. The condition to be evaluated can change dynamically each time that our script is executed (e.g., because it is based on user input).

Moreover, there comes the time where for an anonymous PL/SQL block, embedded in a SQL*Plus script, you want to direct the compiler to omit a specific part of PL/SQL code, based on some condition (e.g., the Oracle version where the script runs, or some user-based input, etc.). This is called "conditional compilation" and is different than conditional execution.

In a programming language (as is PL/SQL) there exist conditional execution constructs (e.g., IF-THEN-ELSE), so conditional execution is not an issue.  What we want to achieve with conditional compilation is to make a part of PL/SQL code "invisible" to the compiler each time we run our script, based on some dynamic condition. For example we might want to avoid a compilation error because of the Oracle version that our script runs on, so we have to "hide" different parts of the code based on the Oracle version.

In traditional PL/SQL, this is achieved with conditional compilation directives, statements, and identifiers prefixed by a dollar sign ($) (read this great article for details how this is done in PL/SQL). The question is how can you do it for anonymous PL/SQL within a SQL*Plus script, where compilation takes place "on-the-fly"?

In this post, we will show how one can achieve both conditional execution and conditional PL/SQL compilation in an SQL*Plus script.

Friday, March 13, 2015

Join hints and join ordering, or, why is the optimizer ignoring my join hint?

Join hints and join ordering, or, why is the optimizer ignoring my join hint?
Have you ever tried to use a join hint (e.g., USE_NL, or USE_HASH) in a query, only to realize that the query optimizer stubbornly refuses to obey?

No matter how frustrating this might be for you, there is an explanation for this behavior. One of the main reasons for such an event is the following rule:

Oracle first decides on the join order and then on the join method to use.

In this post, we will describe how join ordering can be the reason for the optimizer to ignore a hint for forcing a specific join method (e.g., nested loops, hash join, sort merge join etc.) Moreover, we will show how one can use the USE_NL or USE_HASH hints (or any hint that forces a specific join method) in combination with the LEADING (or ORDERED) hint (i.e., hints that force a specific join order).

Sunday, March 1, 2015

Using SQL Plan Baselines to Revert to an Older Plan

Using SQL Plan Baselines to Revert to an Older Plan It's 10:00 AM and your nightly ETL flows are still running! Τhe daily reports must have been sent to the management from 8:00! From you investigation you find out that a specific statement (e.g., one that corresponds to an ETL mapping) is taking much too long than usual (it used to run in minutes and now it is running for 4 hours!). Your first thought is that some change in the execution plan must have taken place. How do you fix this? Or better, how do you fix this fast? (remember the management is waiting for the report)?

The Problem

SQL performance degradation due to a change of plan is one problem, fixing the problem fast so as to allow production ETL flows to unstack is another one, ensuring stability for the execution plan of a statement is yet another one! In a previous post, we discussed a method for detecting and verifying a change of plan. So we assume that we have verified the change of plan and we have also identified some older plan as the "good plan" that we would like to revert to.

In this post, we would like to deal with the case of how we can quickly revert to an older plan, (one that hopefully lies in the cursor cache or/and in the AWR repository) and thus fix the production problem efficiently and in a limited time-window. To this end, we will use the "SQL Plan Management" features of Oracle 11g, which apart from solving the above problem, will also ensure the required stability of the execution plan.

Wednesday, February 11, 2015

Parallel Refresh of a Materialized View

This is a quick post regarding materialized views refresh.

Refresh is invoked with a call to procedure DBMS_MVIEW.REFRESH. For large MVs it is highly desired that the refresh takes place in parallel. The REFRESH procedure parameter "parallelism" makes you think that this is the right tool for this job. Nope!

As much attractive as it might seen, this parameter does not help towards the parallelism of the refresh. You have to either include a parallel hint in the SELECT part of the definition statement of the MV (create materialized view) or/and include a PARALLEL DEGREE > 1 to the base tables of the MV.