Can Oracle Text query return the terms found

full-text-searchoracleoracle-10g

I am querying an Oracle 10g database. The database contains a table with documents that has an oracle text index. I query the database using a query like

SELECT SCORE(1), title from news WHERE CONTAINS(text, 'people OR country ', 1) > 0; 

Is there a way to see what terms in the "contains" statement that Oracle found in the documents?

Best Answer

You might want to look into the MARKUP or HIGHLIGHT functions of the CTX_DOC package. An example of how MARKUP can be used for your query (this assumes your index is called idx_news and you have an ID column):

declare
  MARKLOB   clob;
  MARK1     number;
  MARK2     number;

  cursor NEWS_SEARCH is
    select * from NEWS;
begin
  for REC in NEWS_SEARCH loop
    CTX_DOC.MARKUP('idx_news', to_char(REC.ID), 'people OR country', MARKLOB);
    MARK1 := instr(MARKLOB, '<<<');
    MARK2 := instr(MARKLOB, '>>>');

    if MARK1 + MARK2 <> 0 then
      dbms_output.
       PUT_LINE(
        REC.ID || ',' || substr(MARKLOB, MARK1 + 3, MARK2 - MARK1 - 3));
    end if;
  end loop;
end;
/

What it does is mark up your text (surround the found items with <<<>>>) and outputs that new text to a clob which is then searched for the markup. All text that is found in your query will be marked up but my PL/SQL will only display the first result; it can be adapted to find them all and display only the unique ones. This is a simple example and perhaps can become too complicated for your original query but it's here to point the way.


Alternatively, a simple SELECT may suffice if all you have is an OR in your text query:

select decode(sign(SCORE(1))
             ,1, decode(sign(SCORE(2)), 1, 'both', 'people')
             ,'country')
         as WHICH
      ,TITLE
  from NEWS A
 where CONTAINS(TITLE, 'people', 1) > 0 or CONTAINS(TITLE, 'country', 2) > 0;

This checks the score for individual searches on 'people' and 'country' and works out which ones were positive, returning 'both' if both search terms were found.