The reason for this behaviour is that rows where LD is NULL cannot be found in the index. Therefore Oracle has to scan the full table. If the table is created with LD as a NOT NULL column then the optimizer uses this information and does an INDEX FAST FULL SCAN. If you add a "CHECK(LD is not null)" constraint to the table that has not NOT NULL defined for the column LD then the optimizer does not use the information provided by the constraint and makes a full table scan again, even if you gave him a hint. Jonathan Lewis wrote about this behaviour.
The following scripts demonstrate this behaviour for Oracle 11.2.0.3.0
*create_table.sql* inserts data into the table and creates index and statistics
set autotrace off
drop table objects
/
create table objects(
object_id number,
owner varchar2(30),
object_name varchar2(128),
object_type varchar2(19)
)
/
insert into objects(
object_id,
owner,
object_name,
object_type
)
select
object_id,
owner,
object_name,
object_type
from dba_objects
/
create index idx_object_id on objects(object_id);
exec dbms_stats.gather_table_stats(user,'objects', cascade=>true)
Now run the following script:
spool output
set feedback off
set linesize 300
set trimout on
set trimspool on
@create_table
set autotrace traceonly explain
prompt
prompt 1. plan for query with no constraints:
select distinct object_id
from objects;
rem ---------------------------------------------------
@create_table
alter table objects add constraint nn_object_id check(object_id is not null) validate;
set autotrace traceonly explain
prompt
prompt 2. plan for query with check constraint
select distinct object_id
from objects;
rem ---------------------------------------------------
@create_table
alter table objects modify object_id not null;
set autotrace traceonly explain
prompt
prompt 3.plan for query with NOT NULL column
select distinct object_id
from objects;
rem ---------------------------------------------------
@create_table
create bitmap index bidx_object_type on objects(object_type)
/
set autotrace traceonly explain
prompt
prompt 4.plan for query with bitmap index
select distinct object_type
from objects;
rem ---------------------------------------------------
spool off
This gives the following output
1. plan for query with no constraints:
Execution Plan
----------------------------------------------------------
Plan hash value: 4077265613
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 59063 | 288K| 139 (3)| 00:00:02 |
| 1 | HASH UNIQUE | | 59063 | 288K| 139 (3)| 00:00:02 |
| 2 | TABLE ACCESS FULL| OBJECTS | 59063 | 288K| 136 (0)| 00:00:02 |
------------------------------------------------------------------------------
2. plan for query with check constraint
Execution Plan
----------------------------------------------------------
Plan hash value: 4077265613
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 59063 | 288K| 139 (3)| 00:00:02 |
| 1 | HASH UNIQUE | | 59063 | 288K| 139 (3)| 00:00:02 |
| 2 | TABLE ACCESS FULL| OBJECTS | 59063 | 288K| 136 (0)| 00:00:02 |
------------------------------------------------------------------------------
3.plan for query with NOT NULL column
Execution Plan
----------------------------------------------------------
Plan hash value: 4172758181
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 59063 | 288K| 40 (8)| 00:00:01 |
| 1 | HASH UNIQUE | | 59063 | 288K| 40 (8)| 00:00:01 |
| 2 | INDEX FAST FULL SCAN| IDX_OBJECT_ID | 59063 | 288K| 37 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
4.plan for query with bitmap index
Execution Plan
----------------------------------------------------------
Plan hash value: 2970019208
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 43 | 387 | 6 (34)| 00:00:01 |
| 1 | HASH UNIQUE | | 43 | 387 | 6 (34)| 00:00:01 |
| 2 | BITMAP INDEX FAST FULL SCAN| BIDX_OBJECT_TYPE | 59063 | 519K| 4 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Summary
If there is a normal B*-tree index on the column NULL values are possible in the column, then the optimizer cannot rely only on the information of the index to do the 'select distinc' and makes a TABLE ACCESS FULL.
If there is a normal B*-tree index and a NOT-NULL check constraint on the column the optimizer also does not rely on the information of the index and makes a TABLE ACCESS FULL.
If there is a normal B*-tree index and the column is defined NOT NULL then the optiomizer relies on the information of the index and does a INDEX FAS FULL SCAN.
If there is a bitmap index on the column then the optimzer knows that all information is in the index and does a BITMAP INDEX FAST FULL SCAN
In general, the optimizer is much better at figuring out whether a range scan would be more efficient than a skip scan (or vice versa). If the optimizer is making the wrong choice, the problem almost always comes down to the human not providing the optimizer with the proper information in the form of accurate statistics on the table and on the indexes (including things like histograms on columns that contain skewed data).
For any given query, the only way to know with certainty which of two plans is more efficient is to try it out with your data on your hardware with your query. You can try adding the hint to force the plan to change and you can then measure which approach is actually more efficient. If the plan the optimizer is choosing isn't the most efficient, I would tend to suspect that the root cause is that some aspect of the statistics are not correct but that is something that you'd need to look at a 10053 trace to get to the bottom of in most cases. Rarely, there are cases where the optimizer is wrong because the optimizer isn't able to take into account some aspect of the data (for example, until recently, it was unable to deal with situations where the data in multiple columns was not independent such as a car table with make, model, and year columns where knowing one value constrained the set of valid values in the other columns).
Best Answer
An ancient "issue" that is still with us, even in 18.4
The above is kind of explainable, but this is where the numbers get strange (3670 comes from
num_rows
/num_distinct
):rownum
implicitly triggers somefirst_rows_N
optimization that messes up the cardinality (this was discussed on Oracle-l, 12 years ago: https://www.freelists.org/post/oracle-l/ORDER-BY-and-first-rows-10-madness,4). One workaround is to run it withall_rows
hint (yes, even if it is the current setting in the session), also I had to place the hint for the index, otherwise the database would have chosen FTS in this example:Much better. Another reason to use analytic functions or the 12c+ syntax instead of
rownum
. Analytic function:Or 12c+ syntax: