Help to understand explain plan in Oracle

explainoracleoracle-11g-r2performancequery-performance

I am running a query in some big tables, and although it runs fine even tough is a lot of data, I'd like to understand what part of it weighs on the execution. Unfortunately I am not too good with explain plans so I call for help.

Here is some data about these tables:

  • history_state_table 7.424.65 rows (of which only 13.412 are left after t1.alarm_type = 'AT1' )
  • costumer_price_history 448.284.169 rows
  • cycle_table 215 rows

This would be the query (don't mind the logic, is just for the reference):

SELECT t1.id_alarm, t2.load_id, t2.reference_date
  FROM history_state_table t1,
       (SELECT   op_code, contract_num,
                 COUNT (DISTINCT id_ponto) AS num_pontos,
                 COUNT
                    (DISTINCT CASE
                        WHEN vlr > 0
                           THEN id_ponto
                        ELSE NULL
                     END
                    ) AS bigger_than_zero,
                 MAX (load_id) AS load_id,
                 MAX (reference_date) AS reference_date
            FROM costumer_price_history
           WHERE load_id IN
                            (42232, 42234, 42236, 42238, 42240, 42242, 42244) /* arbitrary IDs depending on execution*/
             AND sistema = 'F1'          /* Hardcoded filters */
             AND rec_type = 'F3'         /* Hardcoded filters */
             AND description = 'F3'      /* Hardcoded filters */
             AND extract_type IN
                    ('T1', 'T2', 'T3')
        GROUP BY op_code, contract_num) t2
 WHERE t1.op_code = t2.op_code
   AND t1.contract_num = t2.contract_num
   AND t1.alarm_type = 'AT1'
   AND t1.alarm_status = 'DONE'
   AND (   (    t1.prod_type = 'COMBO'
            AND t2.bigger_than_zero = t2.num_pontos - 1
           )
        OR (    t1.prod_type != 'COMBO'
            AND t2.bigger_than_zero = t2.num_pontos
           )
       )
       /* arbitrary filter depending on execution*/
   AND t1.data_tratado BETWEEN (SELECT data_inicio
                                  FROM cycle_table
                                 WHERE id_ciclo = 160) AND (SELECT data_fim
                                                              FROM cycle_table
                                                             WHERE id_ciclo =
                                                                           160)

And finally the explain plan:

Plan
SELECT STATEMENT  ALL_ROWSCost: 5,485                           
    13 NESTED LOOPS                         
        7 NESTED LOOPS  Cost: 5,483  Bytes: 115  Cardinality: 1                     
            5 VIEW  Cost: 12  Bytes: 59  Cardinality: 1                 
                4 SORT GROUP BY  Cost: 12  Bytes: 85  Cardinality: 1            
                    3 INLIST ITERATOR       
                        2 TABLE ACCESS BY INDEX ROWID TABLE RAIDPIDAT.COSTUMER_PRICE_HISTORY Cost: 11  Bytes: 85  Cardinality: 1    
                            1 INDEX RANGE SCAN INDEX RAIDPIDAT.IDX_COSTUMER_PRICE_HISTORY_2 Cost: 10  Cardinality: 3  
            6 INDEX RANGE SCAN INDEX RAIDPIDAT.IDX_HISTORY_STATE_TABLE_1TPALM Cost: 662  Cardinality: 102,068               
        12 TABLE ACCESS BY INDEX ROWID TABLE RAIDPIDAT.HISTORY_STATE_TABLE Cost: 5,471  Bytes: 56  Cardinality: 1                   
            9 TABLE ACCESS BY INDEX ROWID TABLE RAIDPIDAT.CYCLE_TABLE Cost: 1  Bytes: 12  Cardinality: 1                
                8 INDEX UNIQUE SCAN INDEX (UNIQUE) RAIDPIDAT.PK_CYCLE_TABLE Cost: 0  Cardinality: 1             
            11 TABLE ACCESS BY INDEX ROWID TABLE RAIDPIDAT.CYCLE_TABLE Cost: 1  Bytes: 12  Cardinality: 1               
                10 INDEX UNIQUE SCAN INDEX (UNIQUE) RAIDPIDAT.PK_CYCLE_TABLE Cost: 0  Cardinality: 1    

Mind that I am not asking "how to rewrite it more efficiently", but rather how do I find witht the explain plan what the most costly operation there. Meanwhile I am reading about it, but I'd appreciate some help.

Best Answer

Explain plan does not tell you what is actually the most costly "operation". The "Cost" column is a guess - it is a value estimated by optimizer. So is "Cardinality" column and "Bytes" column. http://docs.oracle.com/cd/B28359_01/server.111/b28274/ex_plan.htm#i18300

In your example, your optimizer tells you: I decide to use this plan because I guess that looping would cost about 5,483. And I hope this would be the most costly part of the execution, but I can't guarantee this.

The same applies recursively to all the depths of the tree.

If you go in-depth to the lowest levels (that is by intuition most-looped, most-executed levels) you see that the operation that especially sticks out, both in terms of expected cost and expected number of elements, is the

6 INDEX RANGE SCAN INDEX RAIDPIDAT.IDX_HISTORY_STATE_TABLE_1TPALM Cost: 662  Cardinality: 102,068 

So, optimizer guessed that optimal execution of this query is to loop a lot around a poor workhorse RAIDPIDAT.IDX_HISTORY_STATE_TABLE_1TPALM. I really cannot see which part of your query directly relates to it, but I suspect t1.data_tratado condition. And, again, I cannot see if it is really the most costly part.

I'll try to translate the syntax of loops in the explain plan to procedural pseudo-code:

/* begin step 13 (by "step 13" I mean a line that reads "   13 NESTED LOOPS") */
  /* begin step 7 */
    do step 5
    myresult = rows from step 5
    for each row from myresult {
       do step 6
       for each row from step 6 {
           join to a row from myresult the matching row from step 6
       }
    }
  /* end step 7 */
  for each row from myresult {
     do step 12
     for each row from step 12 {
         join to a row from myresult the matching row from step 12
     }
  }
/* end step 13 */
return myresult

Seems complicated, but really aim of each "nested loop" is to create a join (a single table made of two tables) in the most naive way, a loop-inside-a-loop.