Select performance problems with conditional index vs non-conditional on large volume table

indexoracleoracle-11gperformancequery-performance

We are trying to change some indexes on a large-volume table to be conditional indexes (function-based indexes) to reduce how much space is being used for indexes against this table. When testing the performance insert and update times are very minorly affected and easily acceptable. Reads where most of the filtering down is done on the columns in the index show almost no change from non-conditional to conditional index.

However, reads where filtering by the columns in the index still leaves a large number of remaining records and a significant amount of filtering after that takes place using columns not in the index see a significant difference. After adding some data, deleting and re-creating the indexes, and gathering statistics the first read showed similar times between our baseline (index is not conditional) and the conditional index. Subsequent reads using the same query improved greatly for the baseline and did not improve with the conditional index use. I suspect the baseline queries are taking advantage of caching that the conditional index is not.

Has anyone encountered similar performance problems using conditional indexes on Oracle? Any advice how we can determine the exact cause and hopefully avoid this slower performance with the conditional indexes?

We are using Oracle 11g, running the queries in SQL plus or from an application using OleDB doesn't make a difference. We have verified the explain plan for both cases shows the same query plan, only the costs change.

I'll be changing the table and field names here but the structure is the same:

Baseline Index:

CREATE INDEX IEMP_CMPHIRDEP ON EMPLOYEE (COMPANY, HIRE_DATE, DEPT_ID, EMP_ID);

Function-based version (In this case a new column "ARCHIVED" is added to the EMPLOYEE table and the function-based index is made conditional on this column being 'N'. If it is any other value the row will not be included in the index.):

CREATE INDEX IEMP_CMPHIRDEP ON EMPLOYEE (
    CASE WHEN ARCHIVED = 'N' THEN COMPANY END,
    CASE WHEN ARCHIVED = 'N' THEN HIRE_DATE END,
    CASE WHEN ARCHIVED = 'N' THEN DEPT_ID END,
    CASE WHEN ARCHIVED = 'N' THEN EMP_ID END
);

Query which illustrates the performance issue (the query used for the baseline environment is identical to this except that it does not have the CASE statements which are needed to hit the index when using the function-based index):

SELECT /*+ ORDERED  index(EMPLOYEE IEMP_CMPHIRDEP)*/ * 
FROM EMPLOYEE 
WHERE ((CASE WHEN ARCHIVED = 'N' THEN EMPLOYEE.COMPANY END = 'A') 
        AND (CASE WHEN ARCHIVED = 'N' THEN EMPLOYEE.HIRE_DATE END = '09-Sep-2015')
        AND (CASE WHEN ARCHIVED = 'N' THEN EMPLOYEE.DEPT_ID END = 'SALES')  
        AND (EMPLOYEE.LOCATION >= 10000 AND EMPLOYEE.LOCATION <= 10000) 
        AND (EMPLOYEE.PAY >= 30 AND EMPLOYEE.PAY <= 32) 
        AND (EMPLOYEE.SKILLS IN ('A','B','C','D','E'))) 
ORDER BY EMPLOYEE.DEPT_ID,EMPLOYEE.ORG_CODE,EMPLOYEE.EMP_ID;

In this case filtering by COMPANY, HIRE_DATE, and DEPT_ID, which are in the index only narrows down to ~2 million rows. The filtering on LOCATION and PAY narrows the results down to 965 rows. The EMPLOYEE table has over 62 million rows.

Here are the times (in milliseconds) for 10 consecutive reads using that query in the baseline environment:

  1. 71619.51
  2. 29827.58
  3. 30306.79
  4. 30365.6
  5. 29871.75
  6. 29863.03
  7. 30345.64
  8. 30378.42
  9. 29860.57
  10. 29776.82

Here are the times (in milliseconds) for 10 consecutive reads in the conditional index environment. All data in the EMPLOYEE table is identical except for the added ARCHIVED column:

  1. 43797.05
  2. 43744.57
  3. 43562.32
  4. 43738.16
  5. 43700.51
  6. 43759.54
  7. 43643.28
  8. 43814.34
  9. 43607.4
  10. 43797.52

Edit 1:
The explain plans I ran before in SQL Developer showed no difference other than cost but this morning we ran the queries in SQLplus (using select count(1) instead of selecting all of the columns) and our DBA took the explain plans for each (I'm the developer working on this project). Something we noticed was that the Rows shown for the index range scan had a big difference between the two.

Baseline:

Plan hash value: 68573393
-----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                      |     1 |    43 |    61   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE              |                      |     1 |    43 |            |          |
|*  2 |   TABLE ACCESS BY INDEX ROWID| EMPLOYEE             |     1 |    43 |    61   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | IEMP_CMPHIRDEP       |    67 |       |     4   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Function based:

Plan hash value: 68573393
-----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                      |     1 |    20 |  1238   (1)| 00:00:15 |
|   1 |  SORT AGGREGATE              |                      |     1 |    20 |            |          |
|*  2 |   TABLE ACCESS BY INDEX ROWID| EMPLOYEE             |     1 |    20 |  1238   (1)| 00:00:15 |
|*  3 |    INDEX RANGE SCAN          | IEMP_CMPHIRDEP       |  1433 |       |   544   (1)| 00:00:07 |
-----------------------------------------------------------------------------------------------------

Best Answer

All the comments above are appropriate.

If you provide the details on the data types and any not null constraints of the columns in question, then we can better determine whether your function-based approach is appropriate. I would also gather statistics on the indexes and look at the unique values, etc. You can compress your baseline index too, if say you have very few unique values for company and hire date, you can compress by two levels. In fact, you can do the following to see if index compression will help:

analyze index IEMP_CMPHIRDEP validate structure;

Then query the view index_stats:

SYS@instance> select OPT_CMPR_COUNT, OPT_CMPR_PCTSAVE from index_stats;

OPT_CMPR_COUNT OPT_CMPR_PCTSAVE
-------------- ----------------
             3               28

In the above example, compressing the index at level 3 would save 28% space. So if it is a good amount, you would

alter index IEMP_CMPHIRDEP rebuild compress 3;