Case insensitive search of CLOBs in Oracle 12

oracleoracle-12c

Question: What is the most performant way to do a case-insensitive search in a CLOB column?

I've simplified the query to easily reproduce the problem. The filedata table can have millions of rows. Many rows can have the same fileid. The fileid column is indexed. In cases where a particular fileid matches many rows, the proc is really slow.

select * from filedata where fileid=2148102100 AND REGEXP_LIKE (dataClobfield, 'Some Text Here', 'i');

This query completes in 130 seconds. 16 rows are returned by this query, but there 400k rows matching the fileid value (in other words, if I removed the regexp_like where clause, I would get 400k with a fileid of 2148102100).

If I change this to

select * from filedata where fileid=2148102100 AND dataClobfield like '%Some Text Here%';

It completes in 32 seconds but the search is case-sensitive, which does not meet the requirements.

I've also tried

select * from filedata where fileid=2148102100 AND LOWER(dataClobfield) like '%some text here%';

This returns in 83 seconds… better than the regexp_like, but still not great.

Other things I've read about haven't worked: setting the session NLS_SORT=BINARY_CI (doesn't work with CLOBs), using the DBMS_LOB.instr function (case sensitive). Ideally I'd like to get this down to 30 seconds. Any ideas?

Best Answer

Searching without any kind of index seems to be not a good idea. And an analysis without further details is difficult. But you can add resources to improve your speed. It seems that the bottleneck of your query is CPU and not the disks. If disk speed is the limitation there would not be a difference between regexp and LIKE.

  1. You can try to parallelize your query. If the work is done by 4 processes instead of one, it is possible that the query response tim is a quarter of the original time.

  2. you can add an additional clob field that contains the content of dataClobfield in uppercase. this field is used for searching the records you are interested in