Optimizer choosing a bad index

index-tuningoracleoracle-11g-r2

I have a pretty simple select statement,

select column 4 from table_x where column1=:1 and column2=:2 and column3=:3

Columns 1,2 and 3 make up the primary key on the table.

For some reason, rather then using the primary key index, oracle is using a nonclustered index that only contains 1 of the columns.

The only thing that I can find for why the optimizer is ignoring the primary key index, is that the bind variables used in the query are not defined the same way as the columns. The bind variables are all defined as varchar(32), the columns are nchar, number, and nchar. Is this mismatch enough to prevent the optimizer from using the primary key index?

Stats have been updated on the table and all the indexes.

Best Answer

It isn't like there is a rule for the optimizer choosing the index with the very same columns that the query predicates have.

You should always use your bind variables with the proper type. No, in this specific case, those types do not prevent the optimizer from using your index.

Below is just an artificially constructed example. I do not know your table/index structure and data distribution.

drop table t1 purge;
create table t1(c1 nchar(32), c2 number, c3 nchar(32), c4 number);
insert into t1 with g as (select * from dual connect by level <= 1000) select N'HELLO', mod(rownum, 1000), N'WORLD' || rownum, rownum from g,g;
alter table t1 add constraint t1_pk primary key (c1, c2, c3) using index;
create index t1_i1 on t1(c1);
create index t1_i2 on t1(c2);
create index t1_i3 on t1(c3);
exec dbms_stats.gather_table_stats(user, 'T1');

With the above sample data, we have 1 million rows, with C1 having the same value everywhere, C2 having 1000 uniformly distributed different values and C3 being unique (with no constraint on it).

alter session set statistics_level=all;
variable b1 varchar2(32)
variable b2 varchar2(32)
variable b3 varchar2(32)
exec :b1 := 'HELLO';
exec :b2 := '500';
exec :b3 := 'WORLD500';

select c4 from t1 where c1 = :b1 and c2 = :b2 and c3 = :b3;
select * from table(dbms_xplan.display_cursor(format=>'allstats last cost +peeked_binds'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------
SQL_ID  3r9y8nvz3vdtq, child number 0
-------------------------------------
select c4 from t1 where c1 = :b1 and c2 = :b2 and c3 = :b3

Plan hash value: 3324024074

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |     4 (100)|      0 |00:00:00.01 |       3 |
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |      1 |     4   (0)|      0 |00:00:00.01 |       3 |
|*  2 |   INDEX RANGE SCAN                  | T1_I3 |      1 |      1 |     3   (0)|      0 |00:00:00.01 |       3 |
--------------------------------------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

   1 - :1 (VARCHAR2(30), CSID=873): 'HELLO'
   2 - :2 (VARCHAR2(30), CSID=873): '500'
   3 - (VARCHAR2(30), CSID=873): 'WORLD500'

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("C2"=TO_NUMBER(:B2) AND "C1"=SYS_OP_C2C(:B1)))
   2 - access("C3"=SYS_OP_C2C(:B3))

Notice in Peeked Binds section the type of the variables (VARCHAR2), and in the Predicate Information section how they were automatically converted with the SYS_OP_C2C and TO_NUMBER functions. This implicit conversion in this case does not prevent index usage, but you will not be so lucky with other implicit conversions. (Also notice how I got 0 row as result, while I should have got 1 row as result with those values - but that is a different topic.)

The optimizer chose the index on column C3 instead of the PK index. The cost of the query is 4. So the cost of the PK index should be higher. Let's see with a hint:

explain plan set statement_id = 'PK_HINT' for select /*+ INDEX(t1 t1_pk) */ c4 from t1 where c1 = :b1 and c2 = :b2 and c3 = :b3;
select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
Plan hash value: 2347959165

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |   139 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |     1 |   139 |     4   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | T1_PK |     1 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("C1"=SYS_OP_C2C(:B1) AND "C2"=TO_NUMBER(:B2) AND
              "C3"=SYS_OP_C2C(:B3))

The usage of PK index is possible, but still, the optimizer did not choose it. Now most people would say their cost is the same (4), but they would be wrong. Such execution plans do not show all the details. We can check those from our plan table (or V$SQL_PLAN). I have already placed the plan of the PK index version there with the above EXPLAIN PLAN statement, now do it for the original as well:

explain plan set statement_id = 'NO_HINT' for select c4 from t1 where c1 = :b1 and c2 = :b2 and c3 = :b3;

We have both plans in the plan table. The important factor not being displayed is the CPU cost:

break on statement_id
select STATEMENT_ID, OPERATION, OPTIONS, OBJECT_NAME, COST, IO_COST, CPU_COST from plan_table order by STATEMENT_ID, ID;

STATEMENT_ID OPERATION            OPTIONS                OBJECT_NAME COST IO_COST CPU_COST
------------ -------------------- ---------------------- ----------- ---- ------- --------
NO_HINT      SELECT STATEMENT                                           4       4    29046
             TABLE ACCESS         BY INDEX ROWID BATCHED T1             4       4    29046
             INDEX                RANGE SCAN             T1_I3          3       3    21564
PK_HINT      SELECT STATEMENT                                           4       4    29746
             TABLE ACCESS         BY INDEX ROWID         T1             4       4    29746
             INDEX                UNIQUE SCAN            T1_PK          3       3    22414

Notice how the COST equals to IO_COST. However, the CPU COST of using the PK index is higher than of the single column index. Simply put, even if the IO cost is the same, processing more columns requires more CPU time.

Also it is a logical decision, because in this example C3 alone is already unique, so using a smaller index with fewer columns, even if not significantly, should be faster.

When in doubt, with so simple statements and low query cost, you should inspect the details as well. If you have a similarly very selective column, this could be the reason for the optimizer choosing the index on it.