Sql-server – SQL Server 2008: What are the English Word Breakers

full-text-searchsql-server-2008

I'm trying to replicate what SQL Server 2008 is doing when generating full-text indexes, but I'm having a hard time finding a list of which characters SQL Server 2008 considers to be word breakers.

Is there a list out there specifying which characters SQL Server 2008 will break words on for full-text indexes for English?

Best Answer

The per language noise list (also referred to as stop list) and thesaurus files are in the MSSQL/FTData/ folder. The noise files are plain text, the thesaurus XML, so both can be inspected or changed easily.

Configuring Full-Text Linguistic Components in BOL has the details.