Postgresql a strange tsquery behavior

full-text-searchpostgresql

select  to_tsvector('english', 'Ice-cream') @@ to_tsquery('english', 'Ice<->cream');

is True. While:

select  to_tsvector('english', 'iDream  Ice-cream  iScream') @@ to_tsquery('english', 'iDream<->Ice<->cream<->iScream');

is False.

For my understand. Both add same word before and after the match one. It will keep the same answer.

Best Answer

That's because full-text search treats hyphenated words specially:

SELECT to_tsvector('english', 'iDream  Ice cream  iScream');

               to_tsvector                
------------------------------------------
 'cream':3 'ice':2 'idream':1 'iscream':4
(1 row)

The numbers behind the lexemes mark the position they had in the original text (cream is the third word, and so on). That is used for phrase search.

SELECT to_tsvector('english', 'iDream  Ice-cream  iScream');

                      to_tsvector                       
--------------------------------------------------------
 'cream':4 'ice':3 'ice-cream':2 'idream':1 'iscream':5
(1 row)

You see that the original hyphenated word is at the second position, and the parts are represented as following the hyphenated word.

So ice cream is not the same as ice-cream for PostgreSQL full text search. In the first case, ice immediately follows idream, but not in the second case. That is why your query returns FALSE.

Look at what the parser does:

SELECT alias, token, lexemes FROM ts_debug('english', 'iDream  Ice-cream  iScream');

      alias      |   token   |   lexemes   
-----------------+-----------+-------------
 asciiword       | iDream    | {idream}
 blank           |           | 
 asciihword      | Ice-cream | {ice-cream}
 hword_asciipart | Ice       | {ice}
 blank           | -         | 
 hword_asciipart | cream     | {cream}
 blank           |           | 
 asciiword       | iScream   | {iscream}
(8 rows)

Perhaps the solution you are looking for would be to ignore hyphenated words and just keep their parts:

CREATE TEXT SEARCH CONFIGURATION en_no_hyphen
   (COPY = english);

ALTER TEXT SEARCH CONFIGURATION en_no_hyphen
   DROP MAPPING FOR asciihword, hword;

SELECT to_tsvector('en_no_hyphen', 'iDream  Ice-cream  iScream')
       @@ to_tsquery('en_no_hyphen', 'iDream<->Ice<->cream<->iScream');

 ?column? 
----------
 t
(1 row)