Sql-server – Full Text Search Term Casing Affects Results

full-text-searchsql serversql-server-2008-r2

I have been experiencing a full text problem when searching using CONTAINS. I have the full text catalog accent sensitive setting set to false.

The table contains following rows:

Microsoft C#
Microsoft Visual C
Microsoft C

When I try to get "Microsoft C#" using the following query:

select name
from fulltexttable ft with (nolock)
where contains (
        ft.name
        ,N'microsoft and c#'
        )

…it yields the following results:

Microsoft Visual C
Microsoft C

I tried using different casing. It yields the different result set:

select *
from fulltexttable with (nolock)
where contains (
        name
        ,N'microsoft and C#'
        )

Result:

Microsoft C#

The second one is what I expected. I thought SQL Server performed a case insensitive search to get the data, but it seems casing affects the full text results?

Best Answer

This is documented although I couldn't find a reference in Books Online:

The rules for characters followed by nonalphanumeric characters are somewhat convoluted (at least in English). The English word breaker accepts the token C# and returns C#. The lowercase token c#, however, is indexed as c with the # character stripped off. The uppercase token C++ and lowercase token c++, on the other hand, are both indexed as c++.

"Pro Full-Text Search in SQL Server 2008" - Michael Coles, Hilary Cotter, p106

You can check the behaviour of your search phrase using the sys.dm_fts_parser DMV, eg

SELECT * FROM sys.dm_fts_parser ( 'microsoft AND c#', 1033, 0, 0 )
SELECT * FROM sys.dm_fts_parser ( 'microsoft AND C#', 1033, 0, 0 )

As a workaround, you could force all strings through as upper-case, use some kind of mapping table for common search terms, or design a strategy to ensure non-alphanumeric characters are never stored or queried as part of the full-text index.

Note the behaviour is more consistent in SQL 2012 (where both lower and upper case will return only 'Microsoft C#').