Postgresql – TSQuery To Search Number and UoM in Text

full-text-searchpostgresql

I'm beginner with full text search so I'm learning this feature using psql following a tutorial i found here. I modify the data sample to meet my own need:

INSERT INTO documents (document_text) VALUES  
('Pack my box with five inches liquor jugs.'),
('Jackdaws love my big sphinx of quartz.'),
('The 5" wizards jump quickly.'),
('How vexingly quick daft zebras jump!'),
('Bright vixens jump; dozy fowl quack.'),
('Sphinx of black quartz, judge my vow.');

And tried to search using several combination of keyword hoping to get document #1 and #3, such as:

SELECT document_id, document_text FROM documents  
WHERE document_tokens @@ to_tsquery('5"'); 

output: document #3 only

SELECT document_id, document_text FROM documents  
WHERE document_tokens @@ to_tsquery('5&inch'); 

output: null

SELECT document_id, document_text FROM documents  
WHERE document_tokens @@ to_tsquery('5&in'); 

output: document #3 only

SELECT document_id, document_text FROM documents  
WHERE document_tokens @@ to_tsquery('five&in'); 

output: document #1 only

SELECT document_id, document_text FROM documents  
WHERE document_tokens @@ to_tsquery('inch&five'); 

output: document #1 only

While naturally we can assume that 5" / 5 in / five inches all are same, why the fts give different result (can't output both document #1 and #3)? If I use the fts in wrong way, please show me the right implementation of it or if I need to use other search engine to solve my problem, which search engine should I use?

Best Answer

FTS is very clever in some ways, but not very clever in other ways. It is not clever (by default) in the way you want it to be.

You can set up a thesaurus to map words to synonyms, but I don't know if there is any way to get your hands on the " symbol in order to map it. And of course it is unlikely to make sense to convert every ", or for that matter every 'in', into 'inches'.