Why isn’t oracle using an index for distinct query

optimizationoracleperformanceperformance-tuning

This is the scenario

SQL> exec dbms_stats.gather_table_stats(user,'TM', cascade=>true)

PL/SQL procedure successfully completed.

SQL> SELECT SEGMENT_NAME , SEGMENT_TYPE , BYTES / 1024 / 1024 MB , BLOCKS FROM DBA_SEGMENTS WHERE SEGMENT_NAME IN ('TM', 'TM_LD_IX');

SEGMENT_NAME            SEGMENT_TYPE        MB        BLOCKS
------------------------------------------ ---------- ----------
TM                      TABLE                 296      37888
TM_LD_IX                INDEX                  46       5888

SQL> select index_name , column_name from user_ind_columns where index_name = 'TM_LD_IX';

INDEX_NAME   COLUMN_NAME
------------ ------------------------------
TM_LD_IX     LD

SQL> explain plan for select distinct LD from TM;

Explained.

SQL> @ex

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------

Plan hash value: 4241255022

--------------------------------------------------------------------------------------
| Id  | Operation          | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                 |   693 |  4158 |  7920   (8)| 00:01:36 |
|   1 |  HASH UNIQUE       |                 |   693 |  4158 |  7920   (8)| 00:01:36 |
|   2 |   TABLE ACCESS FULL| TM              |  2549K|    14M|  7486   (3)| 00:01:30 |
--------------------------------------------------------------------------------------

9 rows selected.

SQL> explain plan for select /*+ index(x , TM_LD_IX) */ distinct LD from TM x;

Explained.

SQL> @ex

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------

Plan hash value: 4241255022

--------------------------------------------------------------------------------------
| Id  | Operation          | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                 |   693 |  4158 |  7920   (8)| 00:01:36 |
|   1 |  HASH UNIQUE       |                 |   693 |  4158 |  7920   (8)| 00:01:36 |
|   2 |   TABLE ACCESS FULL| TM              |  2549K|    14M|  7486   (3)| 00:01:30 |
--------------------------------------------------------------------------------------

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Prod
PL/SQL Release 10.2.0.3.0 - Production
CORE    10.2.0.3.0      Production
TNS for 32-bit Windows: Version 10.2.0.3.0 - Production
NLSRTL Version 10.2.0.3.0 - Production

As you can see oracle is not using the index on LD and chooses a full table scan instead. I can't even make him use the index with a hist.

In the simple query above I would expect an index fast full scan of TM_LD_IX. my db_file_multiblock_read_count is set to 32 so i'm expecting a cost of about 5888 / 32 = 184 (using the index I can also save the cost of a hash unique).

So, what am I missing here ?

Best Answer

The reason for this behaviour is that rows where LD is NULL cannot be found in the index. Therefore Oracle has to scan the full table. If the table is created with LD as a NOT NULL column then the optimizer uses this information and does an INDEX FAST FULL SCAN. If you add a "CHECK(LD is not null)" constraint to the table that has not NOT NULL defined for the column LD then the optimizer does not use the information provided by the constraint and makes a full table scan again, even if you gave him a hint. Jonathan Lewis wrote about this behaviour.

The following scripts demonstrate this behaviour for Oracle 11.2.0.3.0

*create_table.sql* inserts data into the table and creates index and statistics

set autotrace off
drop table objects
/
create table objects(
  object_id number,
  owner varchar2(30),
  object_name varchar2(128),
  object_type varchar2(19)
)
/

insert into  objects(
  object_id,
  owner,
  object_name,
  object_type
)
select
  object_id,
  owner,
  object_name,
  object_type
from dba_objects
/


create index idx_object_id on objects(object_id);

exec dbms_stats.gather_table_stats(user,'objects', cascade=>true)

Now run the following script:

spool output
set feedback off
set linesize 300
set trimout on
set trimspool on

@create_table
set autotrace traceonly explain

prompt
prompt 1. plan for query with no constraints:
select distinct object_id
from objects;
rem ---------------------------------------------------

@create_table
alter table objects add constraint nn_object_id check(object_id is not null) validate;
set autotrace traceonly explain

prompt
prompt 2. plan for query with check constraint
select distinct object_id
from objects;
rem ---------------------------------------------------

@create_table
alter table objects modify object_id not null;
set autotrace traceonly explain

prompt
prompt 3.plan for query with NOT NULL column
select distinct object_id
from objects;
rem ---------------------------------------------------

@create_table
create bitmap index bidx_object_type on objects(object_type)
/
set autotrace traceonly explain

prompt
prompt 4.plan for query with bitmap index
select distinct object_type
from objects;
rem ---------------------------------------------------


spool off

This gives the following output

1. plan for query with no constraints:

Execution Plan
----------------------------------------------------------
Plan hash value: 4077265613

------------------------------------------------------------------------------
| Id  | Operation          | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |         | 59063 |   288K|   139   (3)| 00:00:02 |
|   1 |  HASH UNIQUE       |         | 59063 |   288K|   139   (3)| 00:00:02 |
|   2 |   TABLE ACCESS FULL| OBJECTS | 59063 |   288K|   136   (0)| 00:00:02 |
------------------------------------------------------------------------------


2. plan for query with check constraint

Execution Plan
----------------------------------------------------------
Plan hash value: 4077265613

------------------------------------------------------------------------------
| Id  | Operation          | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |         | 59063 |   288K|   139   (3)| 00:00:02 |
|   1 |  HASH UNIQUE       |         | 59063 |   288K|   139   (3)| 00:00:02 |
|   2 |   TABLE ACCESS FULL| OBJECTS | 59063 |   288K|   136   (0)| 00:00:02 |
------------------------------------------------------------------------------


3.plan for query with NOT NULL column

Execution Plan
----------------------------------------------------------
Plan hash value: 4172758181

---------------------------------------------------------------------------------------
| Id  | Operation             | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |               | 59063 |   288K|    40   (8)| 00:00:01 |
|   1 |  HASH UNIQUE          |               | 59063 |   288K|    40   (8)| 00:00:01 |
|   2 |   INDEX FAST FULL SCAN| IDX_OBJECT_ID | 59063 |   288K|    37   (0)| 00:00:01 |
---------------------------------------------------------------------------------------


4.plan for query with bitmap index

Execution Plan
----------------------------------------------------------
Plan hash value: 2970019208

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |    43 |   387 |     6  (34)| 00:00:01 |
|   1 |  HASH UNIQUE                 |                  |    43 |   387 |     6  (34)| 00:00:01 |
|   2 |   BITMAP INDEX FAST FULL SCAN| BIDX_OBJECT_TYPE | 59063 |   519K|     4   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Summary

If there is a normal B*-tree index on the column NULL values are possible in the column, then the optimizer cannot rely only on the information of the index to do the 'select distinc' and makes a TABLE ACCESS FULL.

If there is a normal B*-tree index and a NOT-NULL check constraint on the column the optimizer also does not rely on the information of the index and makes a TABLE ACCESS FULL.

If there is a normal B*-tree index and the column is defined NOT NULL then the optiomizer relies on the information of the index and does a INDEX FAS FULL SCAN.

If there is a bitmap index on the column then the optimzer knows that all information is in the index and does a BITMAP INDEX FAST FULL SCAN