Sql-server – SQL Server 2005: Full-text search on a multi-lingual table

full-text-searchsql serversql-server-2005

I have an articles table as follows.

+-----------+--------------+---------+
| uid       | int          | PRIMARY |
| articleId | int          | UNIQUE  |
| lang      | varchar(2)   | UNIQUE  |
| title     | varchar(500) |         |
| body      | varchar(max) |         |
+-----------+--------------+---------+

I would like to implement full-text search on the article title and body. The complication is that the title and body can be in one of six languages, indicated by the "lang" column.

I see there are noise word files for several languages, so depending on the currently set language, I would like to use the appropriate noise file when searching.

Does this require one catalog per language? If separate catalogs are used, is it possible to, for example, not include French articles in the English catalog?

Best Answer

When creating the index, it is possible to specify a "LANGUAGE" argument. Within the description of the "LANGUAGE" argument is the following:

For non-BLOB and non-XML columns containing text data in multiple languages, or for cases when the language of the text stored in the column is unknown, use the neutral (0x0) language resource.

Then, when doing a search using FREETEXT or FREETEXTTABLE, once again use the "LANGUAGE" argument. Microsoft defines the "LANGUAGE" argument as follows:

Is the language whose resources will be used for wordbreaking, stemming, and thesaurus and noise-word removal as part of the FREETEXT query. This parameter is optional and can be specified as a string, integer, or hexadecimal value corresponding to the locale identifier (LCID) of a language. If language_term is specified, the language it represents will be applied to all elements of the search condition. If no value is specified, the column full-text language is used.

So, I will use one catalog and one index on the multi-lingual table using a neutral language resource. Then I'll use the appropriate language setting when searching so everything is parsed correctly.