Outer Join suppressing Index usage

index-tuningoracleoracle-12cperformancequery-performance

I have a client program that is executing a query against a View that outer joins one table to another. The performance is bad and I have been trying to tune it by adding the right index. The query in question is only actually using the second table, so I have been testing against that table directly.

I found (several) indexes that worked fine for the query against the table but when I switched it to use the View, they stopped using any indexes and just did full scans on both tables instead. As these tables are large (2-3 million rows each) this is very slow.

To simply testing, I changed the query to bypass the and just incorporate the outer join into the query itself. This successfully reproduced the problem, but left the mystery of why the outer join wouldn't use the indexes.

Here's the table, with all of the indexes I have added while testing:

  CREATE TABLE TEST_DATA 
   (ID NUMBER(11,0)  PRIMARY KEY, 
    FORMATTED_RESULT VARCHAR2(255 BYTE), 
    F_RESULT NUMBER, 
    IDNUM NUMBER(11,0), 
    IDNUM_DESCRIPTION VARCHAR2(128 BYTE), 
    LAB_NUMBER NUMBER(11,0), 
    SEQ_NUMBER NUMBER(11,0),
    ORDERNO NUMBER(11,0),
    SUPPL_FORMATTED_RESULT VARCHAR2(255 BYTE), 
    SUPPL_IDNUM NUMBER(11,0), 
    SUPPL_IDNUM_DESCRIPTION VARCHAR2(128 BYTE), 
    SUPPL_UNIT VARCHAR2(16 BYTE)
   ) ;

  CREATE UNIQUE INDEX TEST_LN_SQN_ORDER ON TEST_DATA (LAB_NUMBER, SEQ_NUMBER, ORDERNO) ;
  CREATE INDEX TEST_LN_SQN ON TEST_DATA (LAB_NUMBER, SEQ_NUMBER) ;
  CREATE INDEX TD_CUIDD_CUFR ON TEST_DATA (UPPER(COALESCE(SUPPL_IDNUM_DESCRIPTION,IDNUM_DESCRIPTION)), UPPER(COALESCE(SUPPL_FORMATTED_RESULT,FORMATTED_RESULT))) ;
  CREATE INDEX TD_UFR_IDN ON TEST_DATA (UPPER(FORMATTED_RESULT), IDNUM) ;
  CREATE INDEX TD_UIDD_UFR ON TEST_DATA (UPPER(IDNUM_DESCRIPTION), UPPER(FORMATTED_RESULT)) ;
  CREATE INDEX TD_CUFR_CIDN_SN_LN ON TEST_DATA (UPPER(COALESCE(SUPPL_FORMATTED_RESULT,FORMATTED_RESULT)), COALESCE(SUPPL_IDNUM,IDNUM), SEQ_NUMBER, LAB_NUMBER) ;
  CREATE INDEX TD_SN_LN_CUFR_CIDN ON TEST_DATA (SEQ_NUMBER, LAB_NUMBER, UPPER(COALESCE(SUPPL_FORMATTED_RESULT,FORMATTED_RESULT)), COALESCE(SUPPL_IDNUM,IDNUM)) ;
  CREATE INDEX TD_CUFR_CIDN ON TEST_DATA (UPPER(COALESCE(SUPPL_FORMATTED_RESULT,FORMATTED_RESULT)), COALESCE(SUPPL_IDNUM,IDNUM)) ;

Here's the other table (the one that we don't really use for this query)

  CREATE TABLE REQUEST_INFO 
   (NUMBER(11,0) PRIMARY KEY, 
    CHARGE_CODE VARCHAR2(32 BYTE), 
    LAB_NUMBER NUMBER(11,0), 
    SEQ_NUMBER NUMBER(11,0)
   ) ;

  CREATE INDEX RI_LN_SN ON REQUEST_INFO (LAB_NUMBER, SEQ_NUMBER) ;
  CREATE INDEX RI_SN_LN ON REQUEST_INFO (SEQ_NUMBER, LAB_NUMBER) ;

So first, here's the query against the single table directly, that successfully uses one of the indexes.

-- GOOD, Uses index : TD_CUFR_CIDN_SN_LN
select td.LAB_NUMBER 
from test_DATA td 
where UPPER(COALESCE(SUPPL_FORMATTED_RESULT,FORMATTED_RESULT))='491(10)376'
  and COALESCE(TD.SUPPL_IDNUM, TD.IDNUM)=40549 
;

Now here's the query using both tables with an inner join. This also uses the indexes and runs fast.

-- GOOD, Uses indexes : TD_CUFR_CIDN_SN_LN AND RI_SN_LN
select TD.LAB_NUMBER  
from REQUEST_INFO RI 
JOIN TEST_DATA TD ON  TD.LAB_NUMBER = RI.LAB_NUMBER AND TD.SEQ_NUMBER = RI.SEQ_NUMBER 
where UPPER(COALESCE(TD.SUPPL_FORMATTED_RESULT,TD.FORMATTED_RESULT))='491(10)376'
  and COALESCE(TD.SUPPL_IDNUM, TD.IDNUM)=40549 

And here is the same query with a Left Outer Join, as it is written in the view. This does NOT use any of the indexes and runs very slowly.

-- BAD, does not use indexes
select TD.LAB_NUMBER 
from REQUEST_INFO RI 
LEFT JOIN TEST_DATA TD ON  TD.LAB_NUMBER = RI.LAB_NUMBER AND TD.SEQ_NUMBER = RI.SEQ_NUMBER 
where UPPER(COALESCE(TD.SUPPL_FORMATTED_RESULT,TD.FORMATTED_RESULT))='491(10)376'
  and COALESCE(TD.SUPPL_IDNUM, TD.IDNUM)=40549 
;

Now before anyone says it: this query is actually logically identical to the previous one. This is because the WHERE clause is filtering on columns from the outer table (TD), which effectively/logically turns an outer join into an inner join (this is why it matters whether conditions occur in the ON clause vs the WHERE clause).

Now, just to add to the weirdness, I decided to see what would happen if I made the Outer to Inner coercion more explict:

-- GOOD, Uses indexes : TD_CUFR_CIDN_SN_LN AND RI_SN_LN
select TD.LAB_NUMBER 
from REQUEST_INFO RI 
LEFT JOIN TEST_DATA TD ON  TD.LAB_NUMBER = RI.LAB_NUMBER AND TD.SEQ_NUMBER = RI.SEQ_NUMBER 
where UPPER(COALESCE(TD.SUPPL_FORMATTED_RESULT,TD.FORMATTED_RESULT))='491(10)376'
  and COALESCE(TD.SUPPL_IDNUM, TD.IDNUM)=40549 
and TD.LAB_NUMBER IS NOT NULL
;

Incredibly, this worked!

So the question here is, 1) WHY doesn't Oracle figure this out itself?

And 2) Is there some setting or Index, etc. that I can create that will get Oracle to figure this out correctly and use the indexes?

Additional considerations:

  • The view is used by a variety of other queries and clients, so I cannot just change it to an inner join for this one query.

  • The client is generating the query, so it is difficult/nigh-impossible to alter the query with quirky special-case conditions like: "Use this view for this data, unless you only need these columns from this one table, then use a different view", or "when you need these columns and only these columns from this one table, then add an 'IS NOT NULL' to the WHERE clause"

Any suggestions or insights would be welcome.


UPDATE: I just tried it on Oracle 11g as well, I get the exact same results there.


Per request, here is the Explain Plan output, first the good version, where it uses indexes:

Rows      Plan                                       COST    Predicates
        3 SELECT STATEMENT                                 8 
        3  HASH JOIN                                       8 Access:TD.LAB_NUMBER=RI.LAB_NUMBER AND TD.SEQ_NUMBER=RI.SEQ_NUMBER
        3   NESTED LOOPS                                   8 
             STATISTICS COLLECTOR
        3     INDEX RANGE SCAN TD_CUFR_CIDN_SN_LN          4 Access:UPPER(COALESCE(SUPPL_FORMATTED_RESULT,FORMATTED_RESULT))='491(10)376' AND COALESCE(SUPPL_IDNUM,IDNUM)=40549, Filter:TD.LAB_NUMBER IS NOT NULL
        1    INDEX RANGE SCAN RI_SN_LN                     2 Access:TD.SEQ_NUMBER=RI.SEQ_NUMBER AND TD.LAB_NUMBER=RI.LAB_NUMBER
        1   INDEX FAST FULL SCAN RI_SN_LN                  2

And now the bad version:

Rows      Plan                                       COST    Predicates
 31939030 SELECT STATEMENT                            910972
           FILTER                                             Filter:UPPER(COALESCE(SUPPL_FORMATTED_RESULT,FORMATTED_RESULT))='491(10)376' AND COALESCE(SUPPL_IDNUM,IDNUM)=40549
 31939030   HASH JOIN OUTER                           910972 Access:TD.LAB_NUMBER(+)=RI.LAB_NUMBER AND TD.SEQ_NUMBER(+)=RI.SEQ_NUMBER
  6213479    TABLE ACCESS FULL REQUEST_INFO            58276
 56276228    TABLE ACCESS FULL TEST_DATA              409612

Best Answer

This is mostly a partial answer to part 1 with some speculation. You and I know that the following query:

select TD.LAB_NUMBER 
from REQUEST_INFO RI 
LEFT JOIN TEST_DATA TD ON  TD.LAB_NUMBER = RI.LAB_NUMBER AND TD.SEQ_NUMBER = RI.SEQ_NUMBER 
where UPPER(COALESCE(TD.SUPPL_FORMATTED_RESULT,TD.FORMATTED_RESULT))='491(10)376'
  and COALESCE(TD.SUPPL_IDNUM, TD.IDNUM)=40549;

Is equivalent to this query:

select TD.LAB_NUMBER 
from REQUEST_INFO RI 
INNER JOIN TEST_DATA TD ON 
TD.LAB_NUMBER = RI.LAB_NUMBER 
AND TD.SEQ_NUMBER = RI.SEQ_NUMBER 
AND UPPER(COALESCE(TD.SUPPL_FORMATTED_RESULT,TD.FORMATTED_RESULT))='491(10)376'
and COALESCE(TD.SUPPL_IDNUM, TD.IDNUM)=40549;

However, that doesn't mean that Oracle knows that the two queries are equivalent. The equivalence of the two queries is required for Oracle to be able to use the TD_CUFR_CIDN_SN_LN index. What we're hoping for here is an OUTER JOIN to INNER JOIN conversion. I haven't had a lot of luck finding good information about this, so let's look at the explain plans:

LAB_NUMBER

Adding TD.LAB_NUMBER IS NOT NULL to the WHERE clause is a very direct way to let Oracle know that OUTER JOIN to INNER JOIN conversion is possible. We can see that it occurred by looking at the highlighted line. I think that pretty much any column will allow the conversion, although picking the wrong column may change the query results.

If we try a slightly more complicated filter such as (TD.LAB_NUMBER IS NOT NULL OR TD.SEQ_NUMBER IS NOT NULL) then the join conversion does not happen:

no join conversion

We can reason out that the OUTER JOIN is really an INNER JOIN but the query optimizer may have not been programmed to do that. In the original query, you have a COALESCE() expression which is probably just too complex for the query optimizer to apply the query transformation.

Here is a db fiddle for some of the examples.

For the second question, I'm unable to think of a way to work around this. You could try taking advantage of table elimination. As you said this query doesn't even require the REQUEST_INFO table. However, there are a few restrictions:

There are currently a few limitations of table elimination:

  • Multi-column primary key-foreign key constraints are not supported.

  • Referring to the join key elsewhere in the query will prevent table elimination. For an inner join, the join keys on each side of the join are equivalent, but if the query contains other references to the join key from the table that could otherwise be eliminated, this prevents elimination. A workaround is to rewrite the query to refer to the join key from the other table (we realize this is not always possible).

Perhaps there is a way to use that for this problem but I'm unable to work around the restrictions.