Why does the query cost change so much and how to prevent it

oracleoracle-11g-r2performancequery-performance

The query causing me the issue is too big, but the piece that seems to be the core is quite simple, I will try to go with that:

The query has an structure like:

SELECT
    ... a lot of stuff ...
WHERE
    ... lots of complex clauses ...
    AND my_id in ( select my_id from small_table where .. something simple ..)

And that has a cardinality of a few billions, bytes read around 12gb and explodes my temp.

However, if I execute the select my_id from small_table, which yields (always) 7 records, take these records and change the query to:

SELECT
    ... a lot of stuff ...
WHERE
    ... lots of complex clauses ...
    AND my_id in ( 1, 2, 3, 4, 5, 6, 7 )

(I mean, values hardcoded)

Cost, cardinality, and bytes read drop dramatically and the query executes in a few minutes.

Now, I have tried to isolate the "small query" in a with clause, tried to use an join instead a sub query and nothing… the result is always the same.

Why is it this way and how could I possibly prevent it from happening?

Maybe worth mentioning that in both cases (fast and slow) the costly part of the query is a FTS on one of the big tables used in the join.

Also, I am using Oracle 11gR2

[EDIT] These are the explain plans of the two example executions

The bad one. Notice I didn't use in ( ), but rather a simple join adding small_table to the from clause.

Plan
SELECT STATEMENT  ALL_ROWSCost: 5,736,441                                   
    22 HASH JOIN RIGHT SEMI  Cost: 5,736,441  Bytes: 52,324,480  Cardinality: 158,080                               
        11 VIEW VIEW VW_NSO_1 Cost: 20  Bytes: 13  Cardinality: 1                           
            10 NESTED LOOPS                         
                8 NESTED LOOPS  Cost: 18  Bytes: 91  Cardinality: 1                     
                    6 NESTED LOOPS  Cost: 4  Bytes: 70  Cardinality: 1                  
                        4 TABLE ACCESS BY INDEX ROWID TABLE MYUSER.SMALL_TABLE Cost: 1  Bytes: 35  Cardinality: 1           
                            3 INDEX UNIQUE SCAN INDEX (UNIQUE) MYUSER.PK_SMALL_TABLE Cost: 0  Cardinality: 1        
                                2 TABLE ACCESS BY INDEX ROWID TABLE MYUSER.SMALL_TABLE Cost: 2  Bytes: 27  Cardinality: 1   
                                    1 INDEX RANGE SCAN INDEX MYUSER.IDX_SMALL_TABLE_1ATUAL Cost: 1  Cardinality: 6  
                        5 TABLE ACCESS FULL TABLE MYUSER.SMALL_TABLE Cost: 3  Bytes: 35  Cardinality: 1             
                    7 INDEX RANGE SCAN INDEX (UNIQUE) MYUSER.PK_MEDIUM_TABLE Cost: 1  Cardinality: 53               
                9 TABLE ACCESS BY INDEX ROWID TABLE MYUSER.MEDIUM_TABLE Cost: 14  Bytes: 420  Cardinality: 20                   
        21 HASH JOIN  Cost: 5,736,193  Bytes: 15,281,925,342  Cardinality: 48,056,369                           
            19 NESTED LOOPS                         
                17 NESTED LOOPS  Cost: 951,151  Bytes: 500,185,440  Cardinality: 1,736,755                      
                    15 NESTED LOOPS  Cost: 3  Bytes: 792  Cardinality: 22               
                        13 TABLE ACCESS BY INDEX ROWID TABLE MYUSER.SMALL_TABLE Cost: 2  Bytes: 27  Cardinality: 1              
                            12 INDEX RANGE SCAN INDEX MYUSER.IDX_SMALL_TABLE_1ATUAL Cost: 1  Cardinality: 6         
                        14 INDEX RANGE SCAN INDEX (UNIQUE) MYUSER.PK_MEDIUM_TABLE Cost: 1  Bytes: 477  Cardinality: 53              
                    16 INDEX RANGE SCAN INDEX MYUSER.IDX_HUGE_TABLE_1 Cost: 18,849  Cardinality: 1,322,763                  
                18 TABLE ACCESS BY INDEX ROWID TABLE MYUSER.HUGE_TABLE Cost: 413,818  Bytes: 19,620,720  Cardinality: 77,860                    
            20 TABLE ACCESS FULL TABLE MYUSER.HUGE_TABLE Cost: 3,958,129  Bytes: 12,063,594,150  Cardinality: 402,119,805  

The good one. All I did here was taking the result of select my_id from small_table where .. something simple .. (seven records), and added an my_id in ( 1, 2, 3, 4, 5, 6, 7) to the end of the big query. Same as described:

Plan
SELECT STATEMENT  ALL_ROWSCost: 4,558,125                                               
    18 HASH JOIN  Cost: 4,558,125  Bytes: 36,100,625  Cardinality: 122,375                                              
        16 NESTED LOOPS                                         
            14 NESTED LOOPS  Cost: 413,809  Bytes: 5,271,671  Cardinality: 122,597                                      
                12 VIEW VIEW VW_NSO_1 Cost: 20  Bytes: 13  Cardinality: 1                               
                    11 HASH UNIQUE  Bytes: 91  Cardinality: 1                           
                        10 NESTED LOOPS                         
                            8 NESTED LOOPS  Cost: 18  Bytes: 91  Cardinality: 1                     
                                6 NESTED LOOPS  Cost: 4  Bytes: 70  Cardinality: 1                  
                                    4 TABLE ACCESS BY INDEX ROWID TABLE MYUSER.SMALL_TABLE Cost: 1  Bytes: 35  Cardinality: 1           
                                        3 INDEX UNIQUE SCAN INDEX (UNIQUE) MYUSER.PK_SMALL_TABLE Cost: 0  Cardinality: 1        
                                            2 TABLE ACCESS BY INDEX ROWID TABLE MYUSER.SMALL_TABLE Cost: 2  Bytes: 27  Cardinality: 1   
                                                1 INDEX RANGE SCAN INDEX MYUSER.IDX_SMALL_TABLE_1ATUAL Cost: 1  Cardinality: 5  
                                    5 TABLE ACCESS FULL TABLE MYUSER.SMALL_TABLE Cost: 3  Bytes: 35  Cardinality: 1             
                                7 INDEX RANGE SCAN INDEX (UNIQUE) MYUSER.PK_MEDIUM_TABLE Cost: 1  Cardinality: 53               
                            9 TABLE ACCESS BY INDEX ROWID TABLE MYUSER.MEDIUM_TABLE Cost: 14  Bytes: 420  Cardinality: 20                   
                13 INDEX RANGE SCAN INDEX MYUSER.IDX_HUGE_TABLE_1 Cost: 18,849  Cardinality: 1,322,763                                  
            15 TABLE ACCESS BY INDEX ROWID TABLE MYUSER.HUGE_TABLE Cost: 413,788  Bytes: 3,677,910  Cardinality: 122,597                                    
        17 TABLE ACCESS FULL TABLE MYUSER.HUGE_TABLE Cost: 3,962,804  Bytes: 3,655,562,652  Cardinality: 14,506,201  

hope it helps.

Best Answer

Most likely your tables have up-to-date statistics but sometimes the optimizer is baffled because it oversimplifies the cardinality estimation.

This seems to be a good candidate for dynamic sampling. In its default value (2 in 10g and 11g), dynamic sampling will only be used if one of the table has no statistics. In your case you would need to change its value to be able to let the optimizer collect statistics to build a better plan.

I suggest you use the DYNAMIC_SAMPLING hint that will let you modify the optimizer behaviour for a single query. I tested with a subquery and you need to use one of the following syntax:

  • the full hint directly on the top of the query, this will sample all tables, which will definitely work but may take too much time.

    SELECT /*+ DYNAMIC_SAMPLING (10) */ FROM ...
    
  • the full hint on the subquery:

    SELECT
      ... a lot of stuff ...
    WHERE
      ... lots of complex clauses ...
      AND my_id in ( select /*+ DYNAMIC_SAMPLING (10) */ my_id 
                       from small_table 
                      where .. something simple ..)
    
  • the hint with a query block name:

    SELECT /*+ DYNAMIC_SAMPLING (@my_block 10) */
      ... a lot of stuff ...
    WHERE
      ... lots of complex clauses ...
      AND my_id in ( select /*+ QB_NAME(my_block) */ my_id 
                       from small_table 
                      where .. something simple ..)
    

The second and third option should produce the same result: sampling only on one table.