Database Design – Composite Index vs Single-Column Index in Oracle

database-designindexoracle

We have items of three types to be stored and queried according to some business logic.

create table a_table
(
   item_a   varchar2(30),
   item_b   varchar2(16),
   item_c   varchar2(2),
   -- other columns
);

And the indexes

create unique index idx_1 on a_table (item_a);
create unique index idx_2 on a_table (item_b);
create unique index idx_3 on a_table (item_c);

Data for a specific item of a specific type would be read simply as:

-- reading item_a
select ... from a_table where item_a = '...';

-- reading item_b
select ... from a_table where item_b = '...';

-- ...

If an item is of type a, it can't be of type b, so each row must have only one of item_a, item_b or item_c with a value, the others must be null. (This can be enforced in any ways, it does not matter if it impacts insertions.)

In theory the number of type of items could grow (maybe a fourth item type could be added somewhere in the future).

This fact pushed towards another solution which also avoid the ugly mutually exclusive (unchecked in the given example) columns:

create table a_table
(
   item       varchar2(30) not null,
   item_type  varchar2(10) not null,
   -- other columns
);

create unique index idx_1 on a_table (item_type, item);

-- reading item_X
select ... from a_table where item='...' and item_type='item_X';

Disregarding the obvious disadvantage of the first solution when the number of item types increases and other possible design flaws, is there any important performance difference between accessing (select of a row using indexed columns) a table as done in the first case and accessing a table as done in the second one?

The cardinality of item_a and item_b can be millions, the cardinality of item_c is surely under 200.

Notes

Assume that there will be an equal number of selects for item_a, item_b and item_c
In the first solution, consider a composite index (item_c, item_a, item_b) iff it would be better (performance-wise) than three separate indexes, but do not forget that the comparison must be done with the second solution where both indexed columns will be used in the where condition: assume the use of the best “correct” index/indexes for both cases.

Best Answer

I think, from performance perspective, it will be very close.

But some thoughts:

One column indexes do not store NULL values. So individual indexes will be smaller. Composite index (item_type, item ) size will be bigger than sum of individual indexes. In extreme cases even the height of indexes may differ.
Two of three individual indexes will be unique, so it takes one less consistent read compared to nonunique index.
For item_c column oracle can build precise histogram.
In the case of (item_type, item ), oracle will correctly calculate cardinalities for queries with item types A and B, but for "item_type='C' and item=:X" it can underestimate. This is may be not a problem, but in complex queries with joins can give ineffective plan. So you need to check.
About to compare index (item_a,item_b,item_c) with (item_type, item ) - like others said it will work almost only with queries like "item_a=:X", and will not work when "item_b=:Y". So using index (item_a,item_b,item_c) for first case is just a bad idea when we have equal number of queries on item_b and item_c columns.

    create table hr.tt nologging
    as
    select case when mod(rownum,3) = 0 then rownum end AS item_a,
           case when mod(rownum,3) = 1 then rownum end AS item_b,
           case when mod(rownum,3) = 2 then round(dbms_random.value(1,200)) end AS item_c,
           o.* 
     from all_objects o, 
    (select * from dual connect by level < 100) d 
     where rownum <= 5e6; 

    create unique index hr.item_a_idx on hr.tt(item_a) nologging;

    create unique index hr.item_b_idx on hr.tt(item_b) nologging;      

    create index hr.item_c_idx on hr.tt(item_c) nologging;


    create table hr.tt2 nologging
    as
    select case when mod(rownum,3) in (0,1) then rownum
                else round(dbms_random.value(1,200)) 
                end AS item,
           case when mod(rownum,3) = 0 then 'A' 
                when mod(rownum,3) = 1 then 'B' 
                when mod(rownum,3) = 2 then 'C' end AS item_type,
           o.* 
     from all_objects o, 
    (select * from dual connect by level < 100) d 
     where rownum <= 5e6;

    create index hr.item_item_type_idx on hr.tt2(item_type, item) nologging;

some queries

SYS@mydb> select * from hr.tt where item_a = 300;


------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |     1 |   197 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TT         |     1 |   197 |     3   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | ITEM_A_IDX |     1 |       |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
...
          4  consistent gets

SYS@mydb> select * from hr.tt2 where item_type = 'A'  and item = 300;

1 row selected.


--------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                    |     1 |   115 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TT2                |     1 |   115 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | ITEM_ITEM_TYPE_IDX |     1 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          5  consistent gets

Below is an example where optimizer calculations are wrong (oracle 11.2). Optimizer thinks there will be only one row in the "INDEX RANGE SCAN". Actually we see 8438 rows. Extended optimizer statistics on these two columns may be helpful, may be not.

SYS@mydb> select * from hr.tt2 where item_type = 'C'  and item = 150;

8438 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 4063424880

--------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                    |     1 |   115 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TT2                |     1 |   115 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | ITEM_ITEM_TYPE_IDX |     1 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ITEM_TYPE"='C' AND "ITEM"=150)


Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
       8618  consistent gets
       . . .
       8438  rows processed

And below is the same case for individual item_c column. Here optimizer calculations are correct - 8133 rows estimated, 8290 rows selected.

SYS@mydb> select * from hr.tt where item_c = 150;

8290 rows selected.


Execution Plan
----------------------------------------------------------
------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |  8133 |   826K|  7737   (1)| 00:01:33 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TT         |  8133 |   826K|  7737   (1)| 00:01:33 |
|*  2 |   INDEX RANGE SCAN          | ITEM_C_IDX |  8133 |       |    19   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ITEM_C"=150)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       8499  consistent gets
       . . .
       8290  rows processed

Related Solutions

Mysql – How to index this table (a_level, b_level, item_id)

Since your table only three columns, this setup can be straightforward.

Here is an ideal table layout for items

CREATE TABLE items
(
    item_id int not null auto_increment,
    a_level int not null,
    b_level int not null,
    KEY  ai  (a_level,item_id),
    KEY  bi  (b_level,item_id),
    KEY  abi (a_level,b_level,item_id),
    KEY  bai (b_level,a_level,item_id),
    PRIMARY KEY     (item_id)
);

Here are the realm of possible queries

SELECT * FROM items WHERE id = 100; (uses the PRIMARY KEY)
SELECT * FROM items WHERE a_level = 200; (uses KEY ai)
SELECT * FROM items WHERE b_level = 300; (uses KEY bi)
SELECT * FROM items WHERE a_level = 400 order by b_level (uses KEY abi ordering all b_level values under a_level 400 and orders item_id under each b_level)

YYou may try out other combinations of SELECT...GROUP BY...ORDER BY.... You should also run EXPLAIN on every SELECT query and see which indexes are summoned for query execution. Indexes that do not appear as a chosen index can be eliminated (dropped) over the course of time.

Here were questions you asked?

If i INDEX both on a_level, b_level (and of course item_id) and SELECT WHERE a_level= b_level= item_id= will it be bad? I guess only INDEX on b_level and item_id and SELECT WHERE b_level= AND item_id= will be enough/the best solution?

Those queries are not bad given the indexes I just presented.

The only reason why the item_id appears in all the indexes is to make those indexes covering indexes. That way, the retrieval of data is done using the index only, bypassing the table.

MySQL – Creating Unique Index for Specific Column Values

You don't have to check what value you could use in "deleted" if you have a way of generating unique integers, and MySQL does... the UUID_SHORT() function generates a unique and monotonic BIGINT UNSIGNED value every time it's called.

The function is good for 16.7M new values per second, every second, and will never return a duplicate (or smaller than previous) value as long as the @@SERVER_ID remains the same and you don't shut down the server, set the system clock backward in time, and restart the server... and even if you do, the odds of duplicate values are still very small, but no longer truly impossible.

You could also make "deleted" be a DATETIME(6) or TIMESTAMP(6) on MySQL 5.6, using '0000-00-00 00:00:00.000000' to mean "not deleted" and NOW() to mean deleted.

Try as I might, I have not been able to get MySQL to produce a NOW() collision down to the microsecond.

Notes

Best Answer

Related Solutions

Mysql – How to index this table (a_level, b_level, item_id)

MySQL – Creating Unique Index for Specific Column Values

Related Question