How to convince Oracle to create optimal execution plans in SQL queries with bind variables

execution-planindexoptimizationoracleselect

An attempt to parameterize a query in Oracle using bind variables and OR :B1 IS NULL or NVL(:B1,col1) in a WHERE clause leads to suboptimal execution plans.

For example, I'd like to write SELECT like this:

SELECT * FROM MY_TABLE
WHERE (COL1 = :B1 OR :B1 IS NULL) AND (COL2 = :B2 OR :B2 IS NULL) AND (COL3 = :B3 OR :B3 IS NULL);

Its execution plan should depend on the values of the bind variables, bearing in mind that Oracle CBO creates optimal execution plans for each.

As shown in the following execution plan examples, using OR FROM NULL or NVL results in table level filter predicates instead of index level access predicates.

Is there any other way to parameterize SQL query in Oracle?

Test 1

select * from employees where employee_id = :B1;

Plan hash value: 1833546154
 
---------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |     1 |    69 |     1   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMPLOYEES     |     1 |    69 |     1   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | EMP_EMP_ID_PK |     1 |       |     0   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("EMPLOYEE_ID"=TO_NUMBER(:B1))

Test 2

select * from employees where (employee_id = :B1 OR :B1 IS NULL);

Plan hash value: 1445457117
 
-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |     6 |   414 |    35   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| EMPLOYEES |     6 |   414 |    35   (0)| 00:00:01 |
-------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(:B1 IS NULL OR "EMPLOYEE_ID"=TO_NUMBER(:B1))

Test 3

select * from employees where employee_id = NVL(:B1,employee_id);   

Plan hash value: 71496665
 
--------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                 |   108 | 14364 |    36   (0)| 00:00:01 |
|   1 |  VIEW                          | VW_ORE_B4851255 |   108 | 14364 |    36   (0)| 00:00:01 |
|   2 |   UNION-ALL                    |                 |       |       |            |          |
|*  3 |    FILTER                      |                 |       |       |            |          |
|   4 |     TABLE ACCESS BY INDEX ROWID| EMPLOYEES       |     1 |    69 |     1   (0)| 00:00:01 |
|*  5 |      INDEX UNIQUE SCAN         | EMP_EMP_ID_PK   |     1 |       |     0   (0)| 00:00:01 |
|*  6 |    FILTER                      |                 |       |       |            |          |
|   7 |     TABLE ACCESS FULL          | EMPLOYEES       |   107 |  7383 |    35   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - filter(:B1 IS NOT NULL)
   5 - access("EMPLOYEE_ID"=:B1)
   6 - filter(:B1 IS NULL)

Best Answer

Test 1 is logically incorrect for this use-case. If you provide NULL as the value of :B1, it will return 0 rows.

Test 2 is logically correct, but its execution plan is suboptimal, so is performance. In this case the optimizer is unable to utilize the index, because NULL values are not indexed by default.

Test 3 is logically correct, and its execution plan is optimal, and this should be used for a simple case like this. But be very careful with this.

employee_id = NVL(:B1,employee_id) was transformed into a union of two branches collecting results based on employee_id = :B1 and :B1 is null. This is called NVL optimization and is displayed as an OR-expansion (ORE) in the execution plan. This is a nice little trick of the optimizer.

The problem is, this works properly only when you have 1 predicate like that. If you have more predicates like that, the optimizer will not do this trick for the remaining predicates, and it will not created additional branches for the UNION.

A query like this:

SELECT * FROM MY_TABLE
WHERE COL1 = NVL(:B1, COL1) AND COL2 = NVL(:B2, COL2) AND COL3 = NVL(:B3, COL3);

will not produce branches for all path. The optimizer will perform the NVL optimization for 1 predicate, then it will place the remaining predicates in the 2 branches created by the transformation of the first predicate. This means a plan generated from a query like that will never be optimal. This is something that causes a big share of SQL performance problems I encounter, and a not a single developer was aware of this so far. If you want to have optimal execution plans and performance, you should generate the simplest SQL statement on the fly instead of trying to cover all possible scenarios with a single SQL statement. For example, if there was no value provided for :B1, then simply do not include that predicate in the SQL statement.