Why does the query cost change so much and how to prevent it

oracleoracle-11g-r2performancequery-performance

The query causing me the issue is too big, but the piece that seems to be the core is quite simple, I will try to go with that:

The query has an structure like:

SELECT
    ... a lot of stuff ...
WHERE
    ... lots of complex clauses ...
    AND my_id in ( select my_id from small_table where .. something simple ..)

And that has a cardinality of a few billions, bytes read around 12gb and explodes my temp.

However, if I execute the select my_id from small_table, which yields (always) 7 records, take these records and change the query to:

SELECT
    ... a lot of stuff ...
WHERE
    ... lots of complex clauses ...
    AND my_id in ( 1, 2, 3, 4, 5, 6, 7 )

(I mean, values hardcoded)

Cost, cardinality, and bytes read drop dramatically and the query executes in a few minutes.

Now, I have tried to isolate the "small query" in a with clause, tried to use an join instead a sub query and nothing… the result is always the same.

Why is it this way and how could I possibly prevent it from happening?

Maybe worth mentioning that in both cases (fast and slow) the costly part of the query is a FTS on one of the big tables used in the join.

Also, I am using Oracle 11gR2

[EDIT] These are the explain plans of the two example executions

The bad one. Notice I didn't use in ( ), but rather a simple join adding small_table to the from clause.

Plan
SELECT STATEMENT  ALL_ROWSCost: 5,736,441                                   
    22 HASH JOIN RIGHT SEMI  Cost: 5,736,441  Bytes: 52,324,480  Cardinality: 158,080                               
        11 VIEW VIEW VW_NSO_1 Cost: 20  Bytes: 13  Cardinality: 1                           
            10 NESTED LOOPS                         
                8 NESTED LOOPS  Cost: 18  Bytes: 91  Cardinality: 1                     
                    6 NESTED LOOPS  Cost: 4  Bytes: 70  Cardinality: 1                  
                        4 TABLE ACCESS BY INDEX ROWID TABLE MYUSER.SMALL_TABLE Cost: 1  Bytes: 35  Cardinality: 1           
                            3 INDEX UNIQUE SCAN INDEX (UNIQUE) MYUSER.PK_SMALL_TABLE Cost: 0  Cardinality: 1        
                                2 TABLE ACCESS BY INDEX ROWID TABLE MYUSER.SMALL_TABLE Cost: 2  Bytes: 27  Cardinality: 1   
                                    1 INDEX RANGE SCAN INDEX MYUSER.IDX_SMALL_TABLE_1ATUAL Cost: 1  Cardinality: 6  
                        5 TABLE ACCESS FULL TABLE MYUSER.SMALL_TABLE Cost: 3  Bytes: 35  Cardinality: 1             
                    7 INDEX RANGE SCAN INDEX (UNIQUE) MYUSER.PK_MEDIUM_TABLE Cost: 1  Cardinality: 53               
                9 TABLE ACCESS BY INDEX ROWID TABLE MYUSER.MEDIUM_TABLE Cost: 14  Bytes: 420  Cardinality: 20                   
        21 HASH JOIN  Cost: 5,736,193  Bytes: 15,281,925,342  Cardinality: 48,056,369                           
            19 NESTED LOOPS                         
                17 NESTED LOOPS  Cost: 951,151  Bytes: 500,185,440  Cardinality: 1,736,755                      
                    15 NESTED LOOPS  Cost: 3  Bytes: 792  Cardinality: 22               
                        13 TABLE ACCESS BY INDEX ROWID TABLE MYUSER.SMALL_TABLE Cost: 2  Bytes: 27  Cardinality: 1              
                            12 INDEX RANGE SCAN INDEX MYUSER.IDX_SMALL_TABLE_1ATUAL Cost: 1  Cardinality: 6         
                        14 INDEX RANGE SCAN INDEX (UNIQUE) MYUSER.PK_MEDIUM_TABLE Cost: 1  Bytes: 477  Cardinality: 53              
                    16 INDEX RANGE SCAN INDEX MYUSER.IDX_HUGE_TABLE_1 Cost: 18,849  Cardinality: 1,322,763                  
                18 TABLE ACCESS BY INDEX ROWID TABLE MYUSER.HUGE_TABLE Cost: 413,818  Bytes: 19,620,720  Cardinality: 77,860                    
            20 TABLE ACCESS FULL TABLE MYUSER.HUGE_TABLE Cost: 3,958,129  Bytes: 12,063,594,150  Cardinality: 402,119,805

The good one. All I did here was taking the result of select my_id from small_table where .. something simple .. (seven records), and added an my_id in ( 1, 2, 3, 4, 5, 6, 7) to the end of the big query. Same as described:

Plan
SELECT STATEMENT  ALL_ROWSCost: 4,558,125                                               
    18 HASH JOIN  Cost: 4,558,125  Bytes: 36,100,625  Cardinality: 122,375                                              
        16 NESTED LOOPS                                         
            14 NESTED LOOPS  Cost: 413,809  Bytes: 5,271,671  Cardinality: 122,597                                      
                12 VIEW VIEW VW_NSO_1 Cost: 20  Bytes: 13  Cardinality: 1                               
                    11 HASH UNIQUE  Bytes: 91  Cardinality: 1                           
                        10 NESTED LOOPS                         
                            8 NESTED LOOPS  Cost: 18  Bytes: 91  Cardinality: 1                     
                                6 NESTED LOOPS  Cost: 4  Bytes: 70  Cardinality: 1                  
                                    4 TABLE ACCESS BY INDEX ROWID TABLE MYUSER.SMALL_TABLE Cost: 1  Bytes: 35  Cardinality: 1           
                                        3 INDEX UNIQUE SCAN INDEX (UNIQUE) MYUSER.PK_SMALL_TABLE Cost: 0  Cardinality: 1        
                                            2 TABLE ACCESS BY INDEX ROWID TABLE MYUSER.SMALL_TABLE Cost: 2  Bytes: 27  Cardinality: 1   
                                                1 INDEX RANGE SCAN INDEX MYUSER.IDX_SMALL_TABLE_1ATUAL Cost: 1  Cardinality: 5  
                                    5 TABLE ACCESS FULL TABLE MYUSER.SMALL_TABLE Cost: 3  Bytes: 35  Cardinality: 1             
                                7 INDEX RANGE SCAN INDEX (UNIQUE) MYUSER.PK_MEDIUM_TABLE Cost: 1  Cardinality: 53               
                            9 TABLE ACCESS BY INDEX ROWID TABLE MYUSER.MEDIUM_TABLE Cost: 14  Bytes: 420  Cardinality: 20                   
                13 INDEX RANGE SCAN INDEX MYUSER.IDX_HUGE_TABLE_1 Cost: 18,849  Cardinality: 1,322,763                                  
            15 TABLE ACCESS BY INDEX ROWID TABLE MYUSER.HUGE_TABLE Cost: 413,788  Bytes: 3,677,910  Cardinality: 122,597                                    
        17 TABLE ACCESS FULL TABLE MYUSER.HUGE_TABLE Cost: 3,962,804  Bytes: 3,655,562,652  Cardinality: 14,506,201

hope it helps.

Best Answer

Most likely your tables have up-to-date statistics but sometimes the optimizer is baffled because it oversimplifies the cardinality estimation.

This seems to be a good candidate for dynamic sampling. In its default value (2 in 10g and 11g), dynamic sampling will only be used if one of the table has no statistics. In your case you would need to change its value to be able to let the optimizer collect statistics to build a better plan.

I suggest you use the DYNAMIC_SAMPLING hint that will let you modify the optimizer behaviour for a single query. I tested with a subquery and you need to use one of the following syntax:

the full hint directly on the top of the query, this will sample all tables, which will definitely work but may take too much time.
```
SELECT /*+ DYNAMIC_SAMPLING (10) */ FROM ...
```

the full hint on the subquery:

SELECT
  ... a lot of stuff ...
WHERE
  ... lots of complex clauses ...
  AND my_id in ( select /*+ DYNAMIC_SAMPLING (10) */ my_id 
                   from small_table 
                  where .. something simple ..)

the hint with a query block name:

SELECT /*+ DYNAMIC_SAMPLING (@my_block 10) */
  ... a lot of stuff ...
WHERE
  ... lots of complex clauses ...
  AND my_id in ( select /*+ QB_NAME(my_block) */ my_id 
                   from small_table 
                  where .. something simple ..)

The second and third option should produce the same result: sampling only on one table.

Related Solutions

Oracle – Why Oracle Isn’t Using an Index for DISTINCT Query

The reason for this behaviour is that rows where LD is NULL cannot be found in the index. Therefore Oracle has to scan the full table. If the table is created with LD as a NOT NULL column then the optimizer uses this information and does an INDEX FAST FULL SCAN. If you add a "CHECK(LD is not null)" constraint to the table that has not NOT NULL defined for the column LD then the optimizer does not use the information provided by the constraint and makes a full table scan again, even if you gave him a hint. Jonathan Lewis wrote about this behaviour.

The following scripts demonstrate this behaviour for Oracle 11.2.0.3.0

*create_table.sql* inserts data into the table and creates index and statistics

set autotrace off
drop table objects
/
create table objects(
  object_id number,
  owner varchar2(30),
  object_name varchar2(128),
  object_type varchar2(19)
)
/

insert into  objects(
  object_id,
  owner,
  object_name,
  object_type
)
select
  object_id,
  owner,
  object_name,
  object_type
from dba_objects
/


create index idx_object_id on objects(object_id);

exec dbms_stats.gather_table_stats(user,'objects', cascade=>true)

Now run the following script:

spool output
set feedback off
set linesize 300
set trimout on
set trimspool on

@create_table
set autotrace traceonly explain

prompt
prompt 1. plan for query with no constraints:
select distinct object_id
from objects;
rem ---------------------------------------------------

@create_table
alter table objects add constraint nn_object_id check(object_id is not null) validate;
set autotrace traceonly explain

prompt
prompt 2. plan for query with check constraint
select distinct object_id
from objects;
rem ---------------------------------------------------

@create_table
alter table objects modify object_id not null;
set autotrace traceonly explain

prompt
prompt 3.plan for query with NOT NULL column
select distinct object_id
from objects;
rem ---------------------------------------------------

@create_table
create bitmap index bidx_object_type on objects(object_type)
/
set autotrace traceonly explain

prompt
prompt 4.plan for query with bitmap index
select distinct object_type
from objects;
rem ---------------------------------------------------


spool off

This gives the following output

1. plan for query with no constraints:

Execution Plan
----------------------------------------------------------
Plan hash value: 4077265613

------------------------------------------------------------------------------
| Id  | Operation          | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |         | 59063 |   288K|   139   (3)| 00:00:02 |
|   1 |  HASH UNIQUE       |         | 59063 |   288K|   139   (3)| 00:00:02 |
|   2 |   TABLE ACCESS FULL| OBJECTS | 59063 |   288K|   136   (0)| 00:00:02 |
------------------------------------------------------------------------------


2. plan for query with check constraint

Execution Plan
----------------------------------------------------------
Plan hash value: 4077265613

------------------------------------------------------------------------------
| Id  | Operation          | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |         | 59063 |   288K|   139   (3)| 00:00:02 |
|   1 |  HASH UNIQUE       |         | 59063 |   288K|   139   (3)| 00:00:02 |
|   2 |   TABLE ACCESS FULL| OBJECTS | 59063 |   288K|   136   (0)| 00:00:02 |
------------------------------------------------------------------------------


3.plan for query with NOT NULL column

Execution Plan
----------------------------------------------------------
Plan hash value: 4172758181

---------------------------------------------------------------------------------------
| Id  | Operation             | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |               | 59063 |   288K|    40   (8)| 00:00:01 |
|   1 |  HASH UNIQUE          |               | 59063 |   288K|    40   (8)| 00:00:01 |
|   2 |   INDEX FAST FULL SCAN| IDX_OBJECT_ID | 59063 |   288K|    37   (0)| 00:00:01 |
---------------------------------------------------------------------------------------


4.plan for query with bitmap index

Execution Plan
----------------------------------------------------------
Plan hash value: 2970019208

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |    43 |   387 |     6  (34)| 00:00:01 |
|   1 |  HASH UNIQUE                 |                  |    43 |   387 |     6  (34)| 00:00:01 |
|   2 |   BITMAP INDEX FAST FULL SCAN| BIDX_OBJECT_TYPE | 59063 |   519K|     4   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Summary

If there is a normal B*-tree index on the column NULL values are possible in the column, then the optimizer cannot rely only on the information of the index to do the 'select distinc' and makes a TABLE ACCESS FULL.

If there is a normal B*-tree index and a NOT-NULL check constraint on the column the optimizer also does not rely on the information of the index and makes a TABLE ACCESS FULL.

If there is a normal B*-tree index and the column is defined NOT NULL then the optiomizer relies on the information of the index and does a INDEX FAS FULL SCAN.

If there is a bitmap index on the column then the optimzer knows that all information is in the index and does a BITMAP INDEX FAST FULL SCAN

Different execution plan for the same query if I change a value in the predicate

The problem was histograms, I ran statistics and disabled histogram creation and the execution plan used nested loops:

BEGIN
  DBMS_STATS.GATHER_table_STATS (OWNNAME => 'MIDAS', TABNAME => 'MINCISOC', 
  METHOD_OPT => 'FOR ALL COLUMNS SIZE 1');
END;

If I run it with FOR ALL COLUMNS SIZE AUTO again the same problem because it uses hash join. Thanks to Phil for the suggestion.

Best Answer

Related Solutions

Oracle – Why Oracle Isn’t Using an Index for DISTINCT Query

Different execution plan for the same query if I change a value in the predicate

Related Question