Why is Oracle using the index here

indexoracleoracle-10g

Name     Null     Type      
-------- -------- --------- 
ID       NOT NULL NUMBER(4) 
GROUP_ID          NUMBER(4) 
TEXT              CLOB      

There is a btree index on group_id. Here's how many rows each group_id has and the corresponding percentage:

GROUP_ID               COUNT                  PCT                    
---------------------- ---------------------- ---------------------- 
1                      1                      1                      
2                      2                      1                      
3                      4                      3                      
4                      8                      6                      
5                      16                     12                     
6                      32                     24                     
7                      64                     47                     
8                      9                      7                      

I ran this

EXEC DBMS_STATS.GATHER_SCHEMA_STATS(ownname=>'TEST', cascade=>true);

Which, if I understand correctly, will gather stats for the optimizer.

Now, I remember seeing that Oracle will not use the index and perform a full table scan instead if it's retrieving more than 5% or so of all rows. However, when I ran this query, it only started performing a FTS when group_id was 7, which has 47% of all rows.

Is this the way it's supposed to be?

Best Answer

I remember seeing that Oracle will not use the index and perform a full table scan instead if it's retrieving more than 5% or so of all rows.

This is a "rule of thumb" and shouldn't be taken as a prediction. The Oracle CBO chooses an execution plan based on the estimated 'cost' of the options. The estimated cost will depend on various parameters, and the complexity increases with each Oracle release.

You can use hints and explain plan to get a more detailed idea of the relative cost of two plans - but this should be done in a single query because the 'cost' is not guaranteed to be an absolute measure outside of a single plan:

select /*+ FULL(foo) */ * from foo where group_id=10
union all
select /*+ INDEX(i_foo) */ * from foo where group_id=10;

example explain plan with relative costs

Is this the way it's supposed to be?

In short: Yes.

Unless you have identified a particular performance issue you should trust the CBO to choose the correct path (and not, for example, use hints except for testing and experimenting like above). If you have identified a problem, the next step is to start investigating whether the CBO is making incorrect assumptions and how to give it better information - rather than assuming it is broken and/or trying to circumvent it.