Why did a query change its execution plan and how to anticipate the change

optimizationoracle

We have a job using a dynamic query in production for years now and its execution took 5-6 seconds up until our last code deploy. Now the same query takes days to finish. It's a relatively simple nested select that joins 5-6 tables, the largest of which have a few hundred thousand rows (others are mostly small lookup tables).

This same query runs on different database instances very quickly (with much more data and exactly the same codebase).

I'm well aware that tools like explain plan only show predicted execution plan, but the explain plan on this particular instance is very poor (for comparisson, it doesn't use an index which seems to be used on all other instances).

My question is – without changes in query, table structure the query uses or a significant change in the amount of data read, why has the execution plan changed so drastically and how can I anticipate for such a change?

added info
The query in question starts with a SELECT DISTINCT… We noticed that on this db instance, if we remove the distinct (which then doesn't necessarily give us the correct results) the execution plan is very similar to the original, much quicker one.

Edit:
Satistics were run straight after the last code deploy.

Best Answer

The Optimizer takes the parsed representation of SQL statement and Statistics to generate final execution plan with the lowest cost. During this process the Optimizer generates multiple plans and compares them. Execution plans may change as the Optimizer inputs(Parsed SQL Statement and Statistics) get changed.

Why Execution Plans Change

Execution plans can and do change as the underlying optimizer inputs change. EXPLAIN PLAN output shows how the database would run the SQL statement when the statement was explained. This plan can differ from the actual execution plan a SQL statement uses because of differences in the execution environment and explain plan environment. The Oracle Documentation clearly states that Execution plans can differ when we change Schemas and have changes in Costs.

If we run the same SQL statement in different database under different Schemas then also the resulting plan can be different. Even the schema and database is same but the Cost of the execution plan is different then also the optimizer can choose different execution plans. Bind variables, size of the data and its statistics, optimizer's parameters may influence the Cost.

Guessing why the Execution plan has been changed is difficult because we don't have the metadata(which are dynamic) of your database on which we have to query for further investigation and its time consuming task.

You can use SQL Tuning Adviser, SQLTXPLAN, SQL Trace etc to make it easier to find the elements affecting the execution plan.

For details: Oracle Database SQL Tuning Guide

Related Solutions

SQL Server – How TOP Impacts an Execution Plan

I would have guessed that when a query includes TOP n the database engine would run the query ignoring the the TOP clause, and then at the end just shrink that result set down to the n number of rows that was requested. The graphical execution plan seems to indicate this is the case -- TOP is the "last" step. But it appears there is more going on.

The way the above is phrased makes me think you may have an incorrect mental picture of how a query executes. An operator in a query plan is not a step (where the full result set of a previous step is evaluated by the next one.

SQL Server uses a pipelined execution model, where each operator exposes methods like Init(), GetRow(), and Close(). As the GetRow() name suggests, an operator produces one row at a time on demand (as required by its parent operator). This is documented in the Books Online Logical and Physical Operators reference, with more detail in my blog post Why Query Plans Run Backwards. This row-at-a-time model is essential in forming a sound intuition for query execution.

My question is, how (and why) does a TOP n clause impact the execution plan of a query?

Some logical operations like TOP, semi joins and the FAST n query hint affect the way the query optimizer costs execution plan alternatives. The basic idea is that one possible plan shape might return the first n rows more quickly than a different plan that was optimized to return all rows.

For example, indexed nested loops join is often the fastest way to return a small number of rows, though hash or merge join with scans might be more efficient on larger sets. The way the query optimizer reasons about these choices is by setting a Row Goal at a particular point in the logical tree of operations.

A row goal modifies the way query plan alternatives are costed. The essence of it is that the optimizer starts by costing each operator as if the full result set were required, sets a row goal at the appropriate point, and then works back down the plan tree estimating the number of rows it expects to need to examine to meet the row goal.

For example, a logical TOP(10) sets a row goal of 10 at a particular point in the logical query tree. The costs of operators leading up to the row goal are modified to estimate how many rows they need to produce to meet the row goal. This calculation can become complex, so it is easier to understand all this with a fully worked example and annotated execution plans. Row goals can affect more than the choice of join type or whether seeks and lookups are preferred to scans. More details on that here.

As always, an execution plan selected on the basis of a row goal is subject to the optimizer's reasoning abilities and the quality of information provided to it. Not every plan with a row goal will produce the required number of rows faster in practice, but according to the costing model it will.

Where a row goal plan proves not to be faster, there are usually ways to modify the query or provide better information to the optimizer such that the naturally selected plan is best. Which option is appropriate in your case depends on the details of course. The row goal feature is generally very effective (though there is a bug to watch out for when used in parallel execution plans).

Your particular query and plan may not be suitable for detailed analysis here (by all means provide an actual execution plan if you wish) but hopefully the ideas outlined here will allow you to make forward progress.

Different execution plan for the same query if I change a value in the predicate

The problem was histograms, I ran statistics and disabled histogram creation and the execution plan used nested loops:

BEGIN
  DBMS_STATS.GATHER_table_STATS (OWNNAME => 'MIDAS', TABNAME => 'MINCISOC', 
  METHOD_OPT => 'FOR ALL COLUMNS SIZE 1');
END;

If I run it with FOR ALL COLUMNS SIZE AUTO again the same problem because it uses hash join. Thanks to Phil for the suggestion.

Best Answer

Why Execution Plans Change

Related Solutions

SQL Server – How TOP Impacts an Execution Plan

Different execution plan for the same query if I change a value in the predicate

Related Question