Sql-server – Abbreviations in full text search index

full-text-searchsql serversql-server-2012

I have a text column containing various words that have been shortened into abbreviations. For example, instead of "insured", the column might contain "insd". I understand I can use a thesaurus file to create a list of synonyms, effectively allowing a search for "insd" to return rows that contain both "insured" and "insd". Perfect.

However, some of the other abbreviations in the column contain "special" characters, like a slash or an ampersand, such as:

t/p - third party
o/s - other side
p/p - per person
i&o - inside and out

Is there some way I can make CONTAINS or FREETEXT understand queries containing these words? For instance:

SELECT *
FROM dbo.MyTable
WHERE FREETEXT(MyColumn, 't/p');

I've created the thesaurus file, tsenu.xml in the correct location for the instance:

<XML ID="Microsoft Search Thesaurus">
    <thesaurus xmlns="x-schema:tsSchema.xml">
    <diacritics_sensitive>0</diacritics_sensitive>
        <expansion>
            <sub>insured</sub>
            <sub>insd</sub>
        </expansion>
        <expansion>
            <sub>t/p</sub>
            <sub>third party</sub>
        </expansion>
        <expansion>
            <sub>o/s</sub>
            <sub>other side</sub>
        </expansion>
        <expansion>
            <sub>p/p</sub>
            <sub>per person</sub>
        </expansion>
    </thesaurus>
</XML>

And loaded it using EXEC sys.sp_fulltext_load_thesaurus_file 1033;, however the query returns unpredictable results.

Best Answer

When you want to add words involving special characters that you want treated as a single word, you will want to create a custom dictionary.

Creating Custom Dictionaries for special terms to be indexed ‘as-is’ in SQL Server 2008 Full-Text Indexes

Be aware that custom dictionaries are limited in size, plus the larger the dictionary the slower fulltext can become.

Custom dictionaries are great when you have a limited set of words to add like AT&T or M&Ms which are words with special characters that might be used over and over that you wanted treated as a single word instead of the default word breaking behavior.

Custom dictionaries are not great for for words like part numbers (ex. JY-14562, PTW-14432, etc.) where every row in a table would result in a new entry in the custom dictionary. You could potentially have several hundred thousand or more unique part numbers. That would not be a good use for custom dictionary. If you need this type of behavior, you are better off writing your own word breaker. Creating word breakers is not well documented nor easy to create, but it does allow you to decide how you want words broken when you encounter special characters.