SQL Server – Difference in Fulltext Results: CONTAINS vs FREETEXT

full-text-searchsql serversql-server-2008-r2

I have a table with an fulltext index on the column named Filecontent.
The table has a row where content contains "W 917". For context, the content column on this rows contains much more than just what I'm searching for.

I don't understand why I'm getting different results depending on whether I'm using contains, contains with wildcard or freetext.
Why is CONTAINS without wildcard getting results, but CONTAINS with wildcard doesn't?

-- Searching for "W 917"
-- No match - CONTAINS with wildcard
SELECT * FROM InvoicePDFContent t1 WHERE CONTAINS(t1.Filecontent, '"W 917*"')

-- Match - CONTAINS
SELECT * FROM InvoicePDFContent t1 WHERE CONTAINS(t1.Filecontent, '"W 917"')

-- Match - FREETEXT
SELECT * FROM InvoicePDFContent t1 WHERE FREETEXT(t1.Filecontent, '"W 917"')


-- Searching for "W"
-- Match - CONTAINS with wildcard
SELECT * FROM InvoicePDFContent t1 WHERE CONTAINS(t1.Filecontent, '"W*"')

-- No match - CONTAINS
SELECT * FROM InvoicePDFContent t1 WHERE CONTAINS(t1.Filecontent, '"W"')

-- No match - FREETEXT
SELECT * FROM InvoicePDFContent t1 WHERE FREETEXT(t1.Filecontent, '"W"')
ยดยดยด

Best Answer

The difference is caused by noise word "W". By default SQL Server uses stoplist to exclude some short or frequently used words like "and", "does", "could". You can see the full list using such query:

select * from sys.fulltext_system_stopwords where language_id = 1033

And you can disable the stoplist for your full text index if you want.

Now let's take a look at your queries:

SELECT * FROM InvoicePDFContent t1 WHERE CONTAINS(t1.Filecontent, '"W*"')

this one looks for all rows which contain words starting with "W" and these words shouldn't be in your stoplist.

SELECT * FROM InvoicePDFContent t1 WHERE CONTAINS(t1.Filecontent, '"W"')

this query returns nothing because you are looking for a stop word.

SELECT * FROM InvoicePDFContent t1 WHERE CONTAINS(t1.Filecontent, '"W 917"')

this query just ignores the W and looks for "917".

SELECT * FROM InvoicePDFContent t1 WHERE CONTAINS(t1.Filecontent, '"W 917*"')

This is the most interesting case. According to SQL Server documentation:

When <prefix_term> is a phrase, each word contained in the phrase is considered to be a separate prefix. Therefore, a query specifying a prefix term of "local wine*" matches any rows with the text of "local winery", "locally wined and dined", and so on.

It means actually you are looking for phrases which have two words in a row. The first one is like "W*" (but it shouldn't be in stoplist) and the second one is "917*".