I need an advice with searching of record based on specified string.
Search strings can contain values from these columns. Values in this string don´t have to be strictly identical given in the correct order and also the values of some columns in this string may be missing.
Example of search string:
22 Karntner Wien
And I get for example result with top 5 similar records.
I think I should use full text search, but I have no experiences with it. Can you tell me how to proceed?
Best Answer
I suggest this expression for query and index:
Note the custom function
f_concat_ws()
above. That's becauseconcat_ws()
is onlySTABLE
notIMMUTABLE
. You need to create it first:It can be used as drop-in replacement for
concat_ws()
, except that it only accepts actual text data as input (which allows us to make itIMMUTABLE
without cheating, effectively). Detailed explanation (read it!):About
VARIADIC
:For many columns, this is shorter and faster. You could do without it but then the syntax gets rather verbose (see joanolo's answer).
The matching index to go with this:
You are dealing with international address data, so do not use the
english
text search configuration. Stemming makes little sense for names and most of your example data is not even English to begin with. Use thesimple
configuration instead. You need the form with two parameters - see below.Concatenate the strings and call the more expensive function
to_tsvector()
once. Useconcat_ws()
to deal with possible NULL values elegantly. Cheaper overall, and also shorter.Like I commented, Full Text Search has limited support for fuzzy matching, but there is the often overlooked feature of prefix matching:
So, if you are not sure whether it's 'Kärntner' or 'Kärnten', and whether it's 'Straße', 'strasse' or 'Strabe' (like in your buggy example data) but you know that the second word follows the first, you could:
<->
is the phrase search operator and requires Postgres 9.6.And if you want to ignore diacritical signs as well ('ä' <> 'a'), add
unaccent()
to the mix. You can use it as separate function or you can add it as dictionary to your text search configuration. You need to install the extension first ...Overview over pattern matching option in typical Postgres installations:
Joanolo already provided some basic information about FTS and the link to the manual for more.
Adressing your comment
There are two variants of the function
to_tsvector()
- see "function overloading". The 1st takes onlytext
, the 2nd takesregconfig
andtext
. See for yourself:Only the second is
IMMUTABLE
and can be used in an index expression directly. 'simple' in the above example is a text search configuration (regconfig
).More importantly, my oversight:
concat_ws()
(which I had in my first version) is onlySTABLE
, notIMMUTABLE
. I added necessary steps above.Related:
How do IMMUTABLE, STABLE and VOLATILE keywords effect behaviour of function?
Combine two columns and add into one new column