Sql-server – Full text index search returns an unexpected result

full-text-searchsql serversql-server-2008

When I search TaxIdentityNumber using contains an unexpected result is returned; the result contains numbers that are not matched to query. For example, I search using '"5689523657*"' and get 5681254870, 5458976235, etc. as result. I rebuilt the index, but get the same result. What should I try? What did I miss?

The full query:

SELECT *
FROM [Contractors] [contractor] 
INNER JOIN [ReportDates] [reportDate]
    ON [contractor].[ReportDateId] = [reportDate].[Id] 
LEFT JOIN [Persons] [tempPersons]
    ON [contractor].[PersonId] = [tempPersons].[Id] 
LEFT JOIN [Organizations] [tempOrganiztions]
    ON [contractor].[OrganizationId] = [tempOrganiztions].[Id] 
LEFT JOIN [Organizations] [tempHolding]
    ON [contractor].[HoldingId] = [tempHolding].[Id] 
WHERE
    CONTAINS(([contractor].[TaxIdentityNumber]), '"5689523657*"')

Running on SQL Server 2008 Standard Edition. Query to create full text index:

CREATE FULLTEXT CATALOG SPFTI;
GO
CREATE FULLTEXT INDEX ON [Contractors]
(FullName, TaxIdentityNumber, RelatedOrganizationFullName) 
KEY INDEX ContractorPK
ON SPFTI; 

Best Answer

This is a known issue when using SQL Server 2008 to search for a number with more than nine digits. When using certain word-breakers, the number is stored in its scientific form e.g. nn5d68952e+009, leading to false-positive matches. One of the word-breakers affected is the neutral language (lcid 0):

-- Result includes nn5d68952e+009
SELECT display_term 
FROM sys.dm_fts_parser('"5689523657*"', 0, 0, 0);

A language that does not behave this way is us-english (lcid 1033):

-- Result includes nn5689523657
SELECT * 
FROM sys.dm_fts_parser('"5689523657*"', 1033, 0, 0);

There are several other workarounds and fixes mentioned in the KB article link above.