Does indexed column with millions of different values make sense

indexindex-tuningoracleoracle-12c

I noticed that one of my queries was pretty slow, taking about two seconds to execute.

This is the slow query:

SELECT EVENT_DATETIME, EVENT_EVENTTYPE FROM EVENT WHERE (EVENT_ID = :1  AND EVENT_RESULT = :2  )

The index used was created like this:

CREATE INDEX "INDEX_EVENT_DATETIME" ON "EVENT" ("EVENT_DATETIME", "EVENT_ID")

The execution plan shows an index skip scan because obviously the EVENT_DATETIME column is not in the where clause.

So at first I thought I would create a second concatenated index on the table for the columns (EVENT_ID, EVENT_RESULT) but then I noticed that the column EVENT_DATETIME has 18,702,145 distinct values. Of a total of 18,722,706 rows.

There are 4 different queries happening regularly on this table. All of them except one have the EVENT_DATETIME column in the where clause but all of them have EVENT_ID in the where clause.

So my assumption is that the index as it is currently being used doesn't make the queries more efficient.

Is it correct that a concatenated index on the columns (EVENT_ID, EVENT_DATETIME) would be more useful?

Best Answer

For b-tree index , the more distinct values the better.

Index skip scan doesn't seem to be an efficient access method in your case because the first column in the index has high cardinality. It works opposite to "regular" index usage like index range scan or index unique scan where you want the first column to be the most discriminating. It's hard to say if in this particular case full table scan would behave better without trying (FULL or NO_INDEX hints can do the job) .

No doubt, for the query in question index on event_id,event_date is much better than on event_date,event_id.

It's hard to tell without seeing your other queries, but I think you may not necessarily need to have composite indexes. Each column, event_date and event_id, appear to be have very few duplicates (if any) . Then having two indexes, one on EVENT_DATE and another on EVENT_ID will work better than having multiple concatenated indexes - even if a composite index is covering for some queries, the engine will have to do only one or two extra lookups .