Sql-server – SQL Server Full-Text Search: combining proximity term and thesaurus

full-text-searchsql servert-sql

Scenario

I am developing a full-text search over a products table.

The full-text indexed field of th table contains a brief description of the product, the name of the category and the product code.

I succeed in querying the table using the CONTAINS predicate with NEAR terms:

/* @mySearchTerm is a generic proximity term such as '"hdd*"~"sata*""' */
SELECT myFields
FROM SEARCHABLE_PRODUCTS
WHERE CONTAINS(myIndexedField, @mySearchTerm)

Then I try to enhance the search, leveraging on the thesaurus feature in order to enable synonyms search. E.G. 'hard disk', 'hard disk drive', 'hdd'. So I have also configured
the thesaurus xml file, and I succeed in querying the table using the FORMSOF term.

    <expansion>
        <sub>hdd</sub>
        <sub>hard disk</sub>
        <sub>hard disc</sub>
        <sub>harddisk</sub>
    </expansion>

Issue

I do not succeed in combining the NEAR and the FORMSOF terms in the CONTAINS predicate:

/* @mySearchTerm is a generic proximity term such as '"hdd*"~"sata*""' */
SELECT myFields
FROM SEARCHABLE_PRODUCTS
WHERE CONTAINS(myIndexedField, ' FORMSOF (THESAURUS, @mySearchTerm) ')

A syntax error is returned. As to the MSDN reference, it seems that it is not possible. Is there any solution or workaround?

Best Answer

I do not believe that is a supported feature of SQL Server full text searching. That would require a wildcard resolution to words and then a thesaurus lookup of each matching word to gather the thesaurus terms.

This basically maps to a pretty complex query: some one of a group of prefixed words is very near to some one of another group of prefixed words which all then go through a thesaurus lookup to provide even more words. Based on previous experience, that is just not supported.

(I see online that you have asked this elsewhere in the last few months, but without any answers, so I hope that this helps.)

I believe that you can create something useful for your query, but it probably requires externalizing the thesaurus entries by doing something like the following:

DECLARE @FirstGroup TABLE (AGroup NVARCHAR(50));
INSERT INTO @FirstGroup VALUES ('hdd');
INSERT INTO @FirstGroup VALUES ('hard disk');
INSERT INTO @FirstGroup VALUES ('hard disc');
INSERT INTO @FirstGroup VALUES ('harddisk');

DECLARE @SecondGroup TABLE (BGroup NVARCHAR(50));
INSERT INTO @SecondGroup VALUES ('sata');
INSERT INTO @SecondGroup VALUES ('msata');
INSERT INTO @SecondGroup VALUES ('esatap');

DECLARE @SearchString NVARCHAR(2000);
SET @SearchString = ''

SELECT @SearchString = @SearchString + '("'+AGroup+'"~"'+BGroup+'") OR '
FROM @FirstGroup CROSS JOIN @SecondGroup 

SELECT @SearchString = LEFT(@SearchString, LEN(@SearchString) - 3);

SELECT * FROM dbo.TBL_NAME
 WHERE contains(field_name,@SearchString);

So if you load the tables with the candidate thesaurus entries and make the OR connections in the code, it should work just fine.

You did not mention which version of SQL Server you are running, but if you are using SQL Server 2012, you might want to use the newer and more powerful syntax. However, this article also has hints about the legacy NEAR context.
http://technet.microsoft.com/en-us/library/ms142568(v=sql.110).aspx