Postgresql – How to search hyphenated words in PostgreSQL full text search

full-text-searchpattern matchingpostgresql

I have to search for hyphenated words like 'good-morning', 'good-evening', etc.

My query is:

select id, ts_headline(content,
                       to_tsquery('english','good-morning'),
                       'HighlightAll=true MaxFragments=100 FragmentDelimiter=$') 
from table 
where ts_content @@ to_tsquery('english','good-morning');

When executing this query I also get results of 'good' and 'morning' separately. But I want exactly matching words and fragments.
(For ts_content I used the same default config english to create the tsvector.)

How can I search such hyphenated words in PostgreSQL full text search?

Best Answer

The key word here is phrase search, introduced with Postgres 9.6.

Use the tsquery FOLLOWED BY operator <-> or one of the related <N> operators. Or better yet, use the function phraseto_tsquery() to generate your tsquery.
Quoting the manual, it ...

produces tsquery that searches for a phrase, ignoring punctuation

And:

phraseto_tsquery behaves much like plainto_tsquery, except that it inserts the <-> (FOLLOWED BY) operator between surviving words instead of the & (AND) operator. Also, stop words are not simply discarded, but are accounted for by inserting <N> operators rather than <-> operators. This function is useful when searching for exact lexeme sequences, since the FOLLOWED BY operators check lexeme order not just the presence of all the lexemes.

Your query would work like this:

select id
     , ts_headline(content, phraseto_tsquery('english', 'good-morning')
                          , 'HighlightAll=true MaxFragments=100 FragmentDelimiter=$') 
from   tbl 
where  ts_content @@ phraseto_tsquery('english','good-morning');

phraseto_tsquery('english', 'good-morning') generates this tsquery:

'good-morn' <-> 'good' <-> 'morn'

Since "good-morning" is identified as asciihword (hyphenated ASCII word), the stemmed complete word is added before the components. The manual:

It is possible for the parser to produce overlapping tokens from the same piece of text. As an example, a hyphenated word will be reported both as the entire word and as each component: (followed by an example)

to_tsvector() basically does the same on the other end, so everything matches up. This allows for fine-grained options with hyphenated words. The above only finds "good-morning" with a hyphen (or variants stemming to the same). To find all strings with "good" followed by "morn" (or variants stemming to the same) use phraseto_tsquery('english','good morning') generating this tsquery: 'good' <-> 'morn'

OTOH, you can enforce exact matches by adding another filter like:

...
AND content ~* 'good-morning'  -- case insensitive regexp match

Or:

...
AND content ILIKE '%good-morning%'

Seems a bit redundant to the human eye, but this way you get fast full text index support and exact matches.

The latter is mostly equivalent, but different (fewer) characters have special meaning in the LIKE pattern and might need escaping. Related:

Example to demonstrate the operator <N>:

phraseto_tsquery('english', 'Juliet and the Licks') generates this tsquery:

'juliet' <3> 'lick'

<3> meaning that lick must be the third lexeme after juliet.

PROBLEM

From the posts in your question, I see 3 FULLTEXT indexes. There is one for each column.

Why did the query work at all ? MySQL worked with whatever it had. In your case, it searched by a full table scan. That's what the MySQL Query optimizer decided on.

SOLUTION

What you really need is a single FULLTEXT index with all 3 columns

ALTER TABLE articles ADD FULLTEXT content_title_keywords_ndx (content,title,keywords);

Only then can you say

match(content,title,keywords) against ('cats' in boolean mode)

I have suggested making compound FULLTEXT indexes before

Mar 16, 2012 : Speed up search across multiple columns
Oct 13, 2012 : Can underscore be forced as a word splitter without a full-text parser plugin?
All my posts about FULLTEXT indexing and searching

Best Answer

Related Solutions

Postgresql – to use full text search or not

Mysql – Multi-Column Full Text Search Going Very Slow

PROBLEM

SOLUTION

GIVE IT A TRY !!!

Related Question