Database Design – Composite Index vs Single-Column Index in Oracle

database-designindexoracle

We have items of three types to be stored and queried according to some business logic.

create table a_table
(
   item_a   varchar2(30),
   item_b   varchar2(16),
   item_c   varchar2(2),
   -- other columns
);

And the indexes

create unique index idx_1 on a_table (item_a);
create unique index idx_2 on a_table (item_b);
create unique index idx_3 on a_table (item_c);

Data for a specific item of a specific type would be read simply as:

-- reading item_a
select ... from a_table where item_a = '...';

-- reading item_b
select ... from a_table where item_b = '...';

-- ...

If an item is of type a, it can't be of type b, so each row must have only one of item_a, item_b or item_c with a value, the others must be null. (This can be enforced in any ways, it does not matter if it impacts insertions.)

In theory the number of type of items could grow (maybe a fourth item type could be added somewhere in the future).

This fact pushed towards another solution which also avoid the ugly mutually exclusive (unchecked in the given example) columns:

create table a_table
(
   item       varchar2(30) not null,
   item_type  varchar2(10) not null,
   -- other columns
);

create unique index idx_1 on a_table (item_type, item);

-- reading item_X
select ... from a_table where item='...' and item_type='item_X';

Disregarding the obvious disadvantage of the first solution when the number of item types increases and other possible design flaws, is there any important performance difference between accessing (select of a row using indexed columns) a table as done in the first case and accessing a table as done in the second one?

The cardinality of item_a and item_b can be millions, the cardinality of item_c is surely under 200.

Notes

  • Assume that there will be an equal number of selects for item_a, item_b and item_c
  • In the first solution, consider a composite index (item_c, item_a, item_b) iff it would be better (performance-wise) than three separate indexes, but do not forget that the comparison must be done with the second solution where both indexed columns will be used in the where condition: assume the use of the best “correct” index/indexes for both cases.

Best Answer

I think, from performance perspective, it will be very close.

But some thoughts:

  1. One column indexes do not store NULL values. So individual indexes will be smaller. Composite index (item_type, item ) size will be bigger than sum of individual indexes. In extreme cases even the height of indexes may differ.

  2. Two of three individual indexes will be unique, so it takes one less consistent read compared to nonunique index.

  3. For item_c column oracle can build precise histogram.

  4. In the case of (item_type, item ), oracle will correctly calculate cardinalities for queries with item types A and B, but for "item_type='C' and item=:X" it can underestimate. This is may be not a problem, but in complex queries with joins can give ineffective plan. So you need to check.

  5. About to compare index (item_a,item_b,item_c) with (item_type, item ) - like others said it will work almost only with queries like "item_a=:X", and will not work when "item_b=:Y". So using index (item_a,item_b,item_c) for first case is just a bad idea when we have equal number of queries on item_b and item_c columns.

    create table hr.tt nologging
    as
    select case when mod(rownum,3) = 0 then rownum end AS item_a,
           case when mod(rownum,3) = 1 then rownum end AS item_b,
           case when mod(rownum,3) = 2 then round(dbms_random.value(1,200)) end AS item_c,
           o.* 
     from all_objects o, 
    (select * from dual connect by level < 100) d 
     where rownum <= 5e6; 

    create unique index hr.item_a_idx on hr.tt(item_a) nologging;

    create unique index hr.item_b_idx on hr.tt(item_b) nologging;      

    create index hr.item_c_idx on hr.tt(item_c) nologging;


    create table hr.tt2 nologging
    as
    select case when mod(rownum,3) in (0,1) then rownum
                else round(dbms_random.value(1,200)) 
                end AS item,
           case when mod(rownum,3) = 0 then 'A' 
                when mod(rownum,3) = 1 then 'B' 
                when mod(rownum,3) = 2 then 'C' end AS item_type,
           o.* 
     from all_objects o, 
    (select * from dual connect by level < 100) d 
     where rownum <= 5e6;

    create index hr.item_item_type_idx on hr.tt2(item_type, item) nologging;

some queries

SYS@mydb> select * from hr.tt where item_a = 300;


------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |     1 |   197 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TT         |     1 |   197 |     3   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | ITEM_A_IDX |     1 |       |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
...
          4  consistent gets
SYS@mydb> select * from hr.tt2 where item_type = 'A'  and item = 300;

1 row selected.


--------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                    |     1 |   115 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TT2                |     1 |   115 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | ITEM_ITEM_TYPE_IDX |     1 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          5  consistent gets

Below is an example where optimizer calculations are wrong (oracle 11.2). Optimizer thinks there will be only one row in the "INDEX RANGE SCAN". Actually we see 8438 rows. Extended optimizer statistics on these two columns may be helpful, may be not.

SYS@mydb> select * from hr.tt2 where item_type = 'C'  and item = 150;

8438 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 4063424880

--------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                    |     1 |   115 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TT2                |     1 |   115 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | ITEM_ITEM_TYPE_IDX |     1 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ITEM_TYPE"='C' AND "ITEM"=150)


Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
       8618  consistent gets
       . . .
       8438  rows processed

And below is the same case for individual item_c column. Here optimizer calculations are correct - 8133 rows estimated, 8290 rows selected.

SYS@mydb> select * from hr.tt where item_c = 150;

8290 rows selected.


Execution Plan
----------------------------------------------------------
------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |  8133 |   826K|  7737   (1)| 00:01:33 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TT         |  8133 |   826K|  7737   (1)| 00:01:33 |
|*  2 |   INDEX RANGE SCAN          | ITEM_C_IDX |  8133 |       |    19   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ITEM_C"=150)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       8499  consistent gets
       . . .
       8290  rows processed