Sql-server – What characters are word breakers in English for SQL Server 2005 and 2008 R2

full-text-searchsql-server-2005sql-server-2008-r2

I can find what DLL supports English word breakers by using sp_help_fulltext_system_components but I have not been able to find an actual list of the word breaking characters for English (like blank, ., %, etc.).

Anyone know of a source for this info?

Best Answer

This isn't an official list, but using a loop to work through a list of characters, and using sys.dm_fts_parser like so:


declare @i integer
declare @cnt integer
set @i=0
while @i<255
begin
  set @cnt=0
  select @cnt=COUNT(1) FROM sys.dm_fts_parser ('"word1'+REPLACE(CHAR(@i),'"','""')+'word2"', 1033, 0, 0)
  if @cnt>1
    begin
      print 'this char - '+CASE WHEN @i > 31 THEN char(@i) ELSE '' END+' - char('+convert(varchar(3),@i)+') is a word breaker'
    end
  set @i=@i+1
end 

I can generate a list of characters that sys.dm_fts_parser reckons break the words. (sys.dm_fts_parser returns a row for every 'word' found in the import, so if it returns more than 1 row we had a word breaker)

This could be expanded to extended/non-english character sets by using nchar() rather than char() (and a bigger value for @i), and changing parameter 2 (lcid) in the call to sys.dm_fts_parser