Oracle – Using Index on Partitioned Table When Full Scan is Better

oracleperformance

Trying to make some sense out of this observation in Oracle.

Query: SELECT ... FROM MY_TABLE WHERE PART_KEY=x and DT_KEY between 20120101 and 20120731

Uses an index scan on PART_KEY/DT_KEY, even though autotrace shows that the index scan hits more blocks (one at a time, with single-block reads) than a full scan would have hit (in a single multi-block read). Stats are up to date.

The strange part is if I try with a copy of the same table and index WITHOUT partitioning, there seems to be a much higher threshold of selectivity before using the index – the range I gave above did a full scan, and only a very narrow range of values for DT_KEY between d1 and d2 would use the index. I verified the index would be used in the non-partitioned case as well but Oracle seems more biased towards doing a full scan.

How is Oracle making the decision to use the index instead of a full scan, and what else should I be looking at? Stats are up to date since I collected them right away.

Table structure

CREATE TABLE MY_TABLE (
   PART_KEY NUMBER(10) NOT NULL,
   DT_KEY NUMBER(8) NOT NULL,
   ...
) 
PARTITION BY LIST(PART_KEY) (
  PARTITION P1 ...
)
AS SELECT .... FROM [source table]

CREATE INDEX MY_INDEX ON MY_TABLE(PART_KEY, DT_KEY) LOCAL;

EXEC DBMS_STATS.GATHER_TABLE_STATS('[USERNAME]', 'MY_TABLE');

Best Answer

You'd need to do a 10053 trace to see exactly how Oracle is computing its estimated cardinalities. I would assume, though, that Oracle's cardinality estimate for this particular range is too low leading it to use an index where a partition scan would actually be more efficient. Assuming that is the case, I would guess that the problem comes from storing date data in a numeric data type.

If Oracle knows that a particular partition has a min(dt_key) of, say, 20100101 and a max(dt_key) of 20140101, it is going to believe that there is a range of 40,000 values. If there are no histograms and you ask for the range between 20120101 and 20120731, the optimizer will expect that you're going to retrieve 730 of 40,000 possible values or 1.825%. If you used a date to store your date data, however, Oracle would know that there are only 4*365=1460 values and that you're actually retrieving 212 of them or 14.5%. Without knowing the actual range, these are just example calculations, of course, but they show the principle that using the wrong data type can lead to the optimizer getting confused.

Assuming this speculation is correct, you can probably resolve the problem by changing dt_key to a date. You can probably also resolve it by gathering a histogram on dt_key.

Related Solutions

Why is Oracle using the index here

I remember seeing that Oracle will not use the index and perform a full table scan instead if it's retrieving more than 5% or so of all rows.

This is a "rule of thumb" and shouldn't be taken as a prediction. The Oracle CBO chooses an execution plan based on the estimated 'cost' of the options. The estimated cost will depend on various parameters, and the complexity increases with each Oracle release.

You can use hints and explain plan to get a more detailed idea of the relative cost of two plans - but this should be done in a single query because the 'cost' is not guaranteed to be an absolute measure outside of a single plan:

select /*+ FULL(foo) */ * from foo where group_id=10
union all
select /*+ INDEX(i_foo) */ * from foo where group_id=10;

example explain plan with relative costs

Is this the way it's supposed to be?

In short: Yes.

Unless you have identified a particular performance issue you should trust the CBO to choose the correct path (and not, for example, use hints except for testing and experimenting like above). If you have identified a problem, the next step is to start investigating whether the CBO is making incorrect assumptions and how to give it better information - rather than assuming it is broken and/or trying to circumvent it.

Oracle not using index when > used in where clause

Oracle should be able to use the statistics to determine the high and low values for any column if you are keeping them up to date:

drop table foo;
create table foo(id date primary key);
insert into foo select sysdate-level from dual connect by level<31;
select low_value, high_value from dba_tab_columns where table_name='FOO';
/*
LOW_VALUE                                    HIGH_VALUE
-------------------------------------------- ------------------------------------------ 

*/
execute dbms_stats.gather_table_stats('JDOUGLAS', 'FOO');
select low_value, high_value from dba_tab_columns where table_name='FOO';
/*
LOW_VALUE                                    HIGH_VALUE
-------------------------------------------- ------------------------------------------
786F0C0A0E0837                               787001080E0837
*/

Best Answer

Related Solutions

Why is Oracle using the index here

Oracle not using index when > used in where clause

Related Question