Sql-server – Full text search using words stored in another table

full-text-searchsql serversql-server-2012t-sql

Is there a way to do a full-text search using words stored in another table? This is my scenario:

  • Table_A contains phrases entered by users in field "UserInput" (and other stuff). I have a full-text index on this table looking at UserInput.
  • Table_B with one field "Word" containing a bunch (more than a 1k) words that I want to make sure my users are not using.

I tried this query:

Select * 
From Table_A 
  Cross Join Table_B 
Where Contains(UserInput, Word)

But that does not work. The error says "Incorrect syntax near 'Word'". SSMS also complains saying "Expecting STRING, TEXT_LEX or VARIABLE".

I have also tried CHARINDEX and LIKE in the WHERE statement but those functions return false positives as they do not look for whole words but strings within strings.

If full-text will not work, is there any other approach that can help identify whole words from Table_B.Word in Table_A.UserInput?

UPDATE:

Also tried the following:

Declare @AllWords nvarchar(4000) = '';
Select @AllWords = @AllWords + Case When @AllWords = '' Then Word Else ' OR ' + Word End
From Table_B;
Set @AllWords = '''' + @AllWords + '''';
Select *
From Table_A
Where Contains(UserInput, @AllWords);

This one seems to bring up the right results but I don't have a way to know what words were found in Table_A.UserInput.

Best Answer

  1. To reiterate, you currently have a table with a list of words you want to make sure your users are not searching Table_A with.

  2. You have successfully been able to query Table_A with Table_B which contains your list of bad words.

  3. Your end goal going from the first point, to the second point, is that you want to know if they searched for those bad words, what would be returned. What bad word correlates to a return of a record in Table_A.

What you need to do to resolve point 3 is query sys.dm_fts_parser. There is an excellent Example by Jefferson Elias on SQL Shack. I'll post the relevant bit in case it is removed or broken by the internet later:

How to check the results of a Full-Text parsing

There are two ways to check how Full-Text feature parses a given text depending on the source of the text.

Source of the text is a String

If you want to check fast what keywords you would get for a particular string, you might want to use sys.dm_fts_parser built-in function.

Here is an example of call to that function.

The first parameter is the string that has to be parsed. The second parameter is the language identifier. Here, it’s set to 0, which means it’s neutral. The hhird parameter is the identifier of the stoplist. Here no stoplist is used. The last parameter tells this function whether to be sensitive or not to accents. Here, we asked for insensitivity. In other words, this function will take the information you would provide when creating a Full-Text Index.

 select * from sys.dm_fts_parser(
    '" dsolkjfdskljfsd dfsd-MMM-236.127 dojfdslfkjds"',
    0,
    NULL,
    0
) ;

If a table is already created with a Full-Text index, we would use another dynamic management function (DMF) called sys.dm_fts_index_keywords which takes as a parameter:

The database identifier in which it should look at The object identifier in that database It returns a dataset with a hexadecimal representation of the keyword, its corresponding form in the plain text, the identifier of the column in which the keyword has been found and finally the number of documents where this keyword can be found.

You will find below a T-SQL query to get back keywords found by Full-Text feature in our dbo.DM_OBJECT_FILE table so as its results set.

select * 
From sys.dm_fts_index_keywords(DB_ID(),OBJECT_ID('dbo.DM_OBJECT_FILE'))

This method should show you what would be returned based on your criteria. To help with point 1, you may want to look into Stopwords, Stoplists, and the Thesaurus features to help change the words from Table_B into usable words. Alternatively, you may need to implement triggers to keep the terms from being searched.

I imagine this code would work:

Declare @AllWords nvarchar(4000) = '';
Select @AllWords = @AllWords + Case When @AllWords = '' Then Word Else ' OR ' + Word End
From Table_B;
Set @AllWords = '''' + @AllWords + '''';
select * from sys.dm_fts_parser(
    @AllWords,
    0,
    NULL,
    0
) ;