Oracle is not using a unique index for a long key

optimizationoracleoracle-11g-r2

I have a table with 250K rows in my test database. (There are a few hundred millions in production, we can observe the same issue there.) The table has an nvarchar2(50) string identifier, not null, with a unique index on it (it's not the PK).

The identifiers are made up of a first part that has 8 different values in my test database (and about a thousand in production), then an @ sign, and finally a number, 1 to 6 digits long. For example there could be 50 thousand rows that start with 'ABCD_BGX1741F_2006_13_20110808.xml@', and it is followed by 50 thousand different numbers.

When I query for a single row based on its identifier, the cardinality is estimated as 1, the cost is very low, it works fine. When I query for more than one row with several identifiers in an IN expression or an OR expression, the estimations for the index are completely wrong, so a full table scan is used. If I force the index with a hint, it is very fast, the full table scan is actually executed an order of magnitude slower (and a lot more slower in production). So it is an optimizer problem.

As a test, I duplicated the table (in the same schema+tablespace) with the exact same DDL and exact same content. I recreated the unique index on the first table for good measure, and created the exact same index on the clone table. I did a DBMS_STATS.GATHER_SCHEMA_STATS('schemaname',estimate_percent=>100,cascade=>true);. You can even see that the index names are consecutive. So now the only difference between the two tables is that the first one was loaded in random order over a long time period, with blocks scattered on the disk (in a tablespace together with several other big tables), the second was loaded as one batched INSERT-SELECT. Other than that, I can't imagine any difference. (The original table has been shrinked since the last big deletion, and there hasn't been a single delete after that.)

Here are query plans for the sick and the clone table (The strings under the black brush are the same all over the picture, and also under they gray brush.):

query plans

(In this example, there are 1867 rows that start with the identifier that is black brushed. A 2-row query produces a cardinality of 1867*2, a 3-row query produces a cardinality of 1867*3, etc. Can't be a coincidence, Oracle seems to not care about the end of the identifiers.)

What could cause this behavior? Obviously it would be pretty expensive to recreate the table in production.

USER_TABLES: http://i.stack.imgur.com/nDWze.jpg USER_INDEXES: http://i.stack.imgur.com/DG9um.jpg I only changed the schema and tablespace name. You can see that the table and index names are the same as on the query plan screenshot.

Best Answer

(This answers the other question about why the histograms are different.)

Histograms are created by default based on column skew and whether the column was used in a relevant predicate. Copying the DDL and the data is not enough, the workload information is also important.

According to the Performance Tuning Guide:

When you drop a table, workload information used by the auto-histogram gathering feature and saved statistics history used by the RESTORE_*_STATS procedures is lost. Without this data, these features do not function properly.

For example, here is a table with skewed data but no histogram:

drop table test1;
create table test1(a date);
insert into test1 select date '2000-01-01'+level from dual connect by level <= 10;
insert into test1 select date '2000-01-01' from dual connect by level <= 1000;
begin
    dbms_stats.gather_table_stats(user, 'TEST1');
end;
/
select histogram from user_tab_columns where table_name = 'TEST1';

HISTOGRAM
---------
NONE

Running the same thing, but with a query before the statistics are gathered, will generate a histogram.

drop table test1;
create table test1(a date);
insert into test1 select date '2000-01-01'+level from dual connect by level <= 10;
insert into test1 select date '2000-01-01' from dual connect by level <= 1000;
select count(*) from test1 where a = sysdate; --Only new line
begin
    dbms_stats.gather_table_stats(user, 'TEST1');
end;
/
select histogram from user_tab_columns where table_name = 'TEST1';

HISTOGRAM
---------
FREQUENCY