Oracle: How to create secondary index based on every word in a CLOB

indexoracleregex

I have a Table where each record is made up of a VARCHAR2 and a CLOB. The VARCHAR2 is the primary key. There are currently 1.5 million records and growing. The table is only ever used for REGEX searches on the CLOB field. The primary key is just used to prevent duplicate insertions. To speed up searching, I'd like to build a secondary index of regular words found in the CLOB field.

I was thinking that maybe another table would act as an index. The primary keys of this new table are every unique word found in the CLOB field, of every record, of the original table. The other field would be a list of pointers to the records where that word exists in the original table.

Is there a more conventional way to do this? The original table will grow to over 10 million records. How would I build a secondary index that significantly speeds up REGEX searches by first searching for any regular words found in the expression, then, after taking that smaller list of returned records, it would search using the real REGEX search?

Best Answer

You probably want to use a contextual index. Then when you search in the clob column you would do something like

myclob contains ( 'aword') > 0

This would then be a substitute for regexp. I have not used this feature in a few years, but it does take some work to create and maintain the indexes. You should look for specific examples that people created and used the index.

Indexing with Oracle Text