Indexing strategy for VARCHAR2 LIKE search

indexindex-tuningoptimizationoracle

Scenario

Database

  • Oracle Database 11g

Table

  • Users
    • ID Number, PK
    • CODE Number, Indexed, Unique, Not null
    • FIRST_NAME VARCHAR(100), Not null
    • LAST_NAME VARCHAR(100), Not null
    • EMAIL VARCHAR(50), Not null

Search

  • Except for ID and CODE, all other columns has mixed case.
  • I need to search a user by an string. The search should be case insensitive.
  • The string is a partial value. (ex: 'foo' to search for 'foobar')
  • The pattern can be present on any part of a value on the colum. (ex: 'ob' for foobar)
  • The pattern can be present on any of the table fields.

SQL

SELECT *
FROM users u
WHERE 
u.code = COALESCE( to_number( REGEXP_REPLACE(:pattern, '[^0-9]+', '') ), -1)
OR upper(u.email) LIKE upper(:pattern)
OR upper(u.last_name) LIKE upper(:pattern) 
OR upper(u.first_name) LIKE upper(:pattern) 

Query Plan

  • Results show a full table scan with an OR Filter Predicate

Question

  1. What is the best strategy to optimize this query? Can we reduce the cost somehow?
  2. Is beneficial to use an index on the VARCHAR2 column(s)? Even with the search using the LIKE operator?
  3. Can we use a different SQL(s) to take advantage of the indexes and join the results at the end?

Text Search Engines

  • I'm aware that we could use an text search engine to handle that. But let's discuss what can be done using the database.

Related Links

Best Answer

No, forget LIKE operator for this, conditions as first_name LIKE '%name%' will never be effective. A regular index can not be used for range scans for conditions as first_name LIKE '%name%'. If first_name is indexed, index full scan is the best the database can use. And you still need to handle case-sensitity.

(index range scan can be used for conditions as first_name like 'name%', where the beginning of the value is fixed)

This is a task for Oracle Text.

No, forget virtual columns, you can not create text indexes on virtual columns, this is what you get if you try it:

DRG-11304: function-based indexes are not supported by this indextype

I have answered a similar question: Oracle “matches” multiple columns

But here it is again, with an example:

create table t1
(
  id number primary key,
  code number not null,
  first_name varchar2(100 char) not null,
  last_name varchar2(100 char) not null,
  email varchar2(50 char) not null
);

insert into t1 values (1, 111, 'Balazs', 'Papp', 'dba@stackexchange.com');
insert into t1 values (2, 222, 'linux', 'unil', 'index@stackexchange.com');
commit;

begin
ctx_ddl.create_preference('t1_multi_column_datastore', 'multi_column_datastore');
ctx_ddl.set_attribute('t1_multi_column_datastore', 'columns', 'code, first_name, last_name, email');
ctx_ddl.create_preference('t1_lexer', 'basic_lexer');
ctx_ddl.set_attribute('t1_lexer', 'mixed_case','no');
end;
/

create index i1 on t1 (first_name) indextype is ctxsys.context
  parameters ('lexer t1_lexer datastore t1_multi_column_datastore');

Then the query:

SQL> select * from t1 where contains (first_name, '%STACK%') > 0;

        ID       CODE FIRST_NAME LAST_NAME  EMAIL
---------- ---------- ---------- ---------- -------------------------
         1        111 Balazs     Papp       dba@stackexchange.com
         2        222 linux      unil       index@stackexchange.com

SQL> select * from t1 where contains (first_name, '%11%') > 0;

        ID       CODE FIRST_NAME LAST_NAME  EMAIL
---------- ---------- ---------- ---------- -------------------------
         1        111 Balazs     Papp       dba@stackexchange.com

SQL> select * from t1 where contains (first_name, '%Uni%') > 0;

        ID       CODE FIRST_NAME LAST_NAME  EMAIL
---------- ---------- ---------- ---------- -------------------------
         2        222 linux      unil       index@stackexchange.com

Notice how I did not specify all the columns, still the query searched in all of them (enumerated in the multi_column_datastore). I specified first_name in the index and the query, yet the query returned records that matched on the email, code or last_name column.

Notice how I did not specify anything related to case-sensitivity in the query. It was specified with the mixed_case=no property (Oracle Text word searches are not case-insensitive by default: Case-Sensitive Searching).

And the query uses the index:

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
SQL_ID  a4vw67psmh4x3, child number 0
-------------------------------------
select * from t1 where contains (first_name, '%Uni%') > 0

Plan hash value: 1218792127

------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |       |       |     4 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1   |     1 |   544 |     4   (0)| 00:00:01 |
|*  2 |   DOMAIN INDEX              | I1   |       |       |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CTXSYS"."CONTAINS"("FIRST_NAME",'%Uni%')>0)