SQL Server – FullTextSearch Indexing and Searching Words with Special Characters

full-text-searchsql serversql server 2014

I'm trying to query FullTextSearch indexed column for hashtags

SELECT Bio FROM Users
WHERE CONTAINS (Bio,'promoter')

SELECT Bio FROM Users
WHERE CONTAINS (Bio,'#promoter')

both results returns the same record

promoter of #Art

I tried also to put double quotes around the searched term "#promoter" but same result.

The same problem when I'm searching for '@something'

What I need to do in order FullTextSearch to index the special characters?

Best Answer

Hmm I'm afraid '#' counts as punctuation and SQL Server Full-Text Indexing was invented before Twitter. There are however a couple of approaches:

1) Pre-processing

Use the full-text functions to fetch most of the data then refine it with Like, eg

SELECT Id
INTO #tmp
FROM dbo.Users
WHERE CONTAINS ( Bio, '#promoter' )

SELECT *
FROM dbo.Users u
WHERE u.Bio Like '%#promoter%'
AND EXISTS ( SELECT * FROM #tmp t WHERE u.Id = t.Id )

Whilst you could use CONTAINSTABLE to do the same thing in one query, you can't necessarily guarantee what order the plan will take - and why would you, T-SQL is after all declarative. I prefer to be explicit, but if you find performance acceptable and/or data volumes are low, combine the two, eg

SELECT *
FROM dbo.Users u
    INNER JOIN CONTAINSTABLE ( Users, Bio, '"#promoter"' ) ct ON u.Id = ct.[key]
WHERE u.Bio Like '%#promoter%'

I did some simple tests on a 1 million row table and it seemed to perform ok. YMMV.

2) Tokenised version of column

Store an alternate version, eg a computed column where '#' is replaced with the literal text 'hashtag', something like this:

-- Add computed column
ALTER TABLE dbo.Users ADD cleanBio AS REPLACE ( Bio, '#', 'hashtag' )

-- Add the computed column to full-text index
ALTER FULLTEXT INDEX ON dbo.Users ADD ( cleanBio ) 
GO

SELECT * FROM dbo.Users
WHERE CONTAINS ( cleanBio, 'hashtagpromoter' )

Obviously there is some storage overhead here.

Consider trying these approaches with your data and let me know how you get on.