Oracle – Using Index on Partitioned Table When Full Scan is Better

oracleperformance

Trying to make some sense out of this observation in Oracle.

Query: SELECT ... FROM MY_TABLE WHERE PART_KEY=x and DT_KEY between 20120101 and 20120731

Uses an index scan on PART_KEY/DT_KEY, even though autotrace shows that the index scan hits more blocks (one at a time, with single-block reads) than a full scan would have hit (in a single multi-block read). Stats are up to date.

The strange part is if I try with a copy of the same table and index WITHOUT partitioning, there seems to be a much higher threshold of selectivity before using the index – the range I gave above did a full scan, and only a very narrow range of values for DT_KEY between d1 and d2 would use the index. I verified the index would be used in the non-partitioned case as well but Oracle seems more biased towards doing a full scan.

How is Oracle making the decision to use the index instead of a full scan, and what else should I be looking at? Stats are up to date since I collected them right away.

Table structure

CREATE TABLE MY_TABLE (
   PART_KEY NUMBER(10) NOT NULL,
   DT_KEY NUMBER(8) NOT NULL,
   ...
) 
PARTITION BY LIST(PART_KEY) (
  PARTITION P1 ...
)
AS SELECT .... FROM [source table]

CREATE INDEX MY_INDEX ON MY_TABLE(PART_KEY, DT_KEY) LOCAL;

EXEC DBMS_STATS.GATHER_TABLE_STATS('[USERNAME]', 'MY_TABLE');

Best Answer

You'd need to do a 10053 trace to see exactly how Oracle is computing its estimated cardinalities. I would assume, though, that Oracle's cardinality estimate for this particular range is too low leading it to use an index where a partition scan would actually be more efficient. Assuming that is the case, I would guess that the problem comes from storing date data in a numeric data type.

If Oracle knows that a particular partition has a min(dt_key) of, say, 20100101 and a max(dt_key) of 20140101, it is going to believe that there is a range of 40,000 values. If there are no histograms and you ask for the range between 20120101 and 20120731, the optimizer will expect that you're going to retrieve 730 of 40,000 possible values or 1.825%. If you used a date to store your date data, however, Oracle would know that there are only 4*365=1460 values and that you're actually retrieving 212 of them or 14.5%. Without knowing the actual range, these are just example calculations, of course, but they show the principle that using the wrong data type can lead to the optimizer getting confused.

Assuming this speculation is correct, you can probably resolve the problem by changing dt_key to a date. You can probably also resolve it by gathering a histogram on dt_key.