Sql-server – SQL Server 2008 R2 Most Efficient Full Text Search Field Type

full-text-searchsql server

In SQL Server 2008 R2, what is the most efficient column type to store data enabling full text search?

I can currently see two options, nvarchar and varbinary.

Which one performs better and why?

EDIT

If, for example, at the moment all I need is text, but in the future it might change, will there be a performance hit if i use varbinary?

Best Answer

varbinary is for binary data.

nvarchar is for character data.

Regardless of what the full text index is capable of indexing, use a data type suited to the domain.

Also, you will be able to see the contents of a nvarchar column in directly SSMS.

Performance issues are more likely to do with the frequency of maintenence of indexes, and hardware such as where your data is stored (as in a separate Filegroup), and memory usage, and indexing type (full population, incremental, manual, and auto change tracking population).

Note that SQL Server 2008 full-text search is more aggressive in memory usage and may use more I/O; 64‑bit architectures are better than 32‑bit in doing crawl tasks in general.

Sometimes crawl can be too aggressive in committing memory and this could cause sqlserver/system to have fewer resources than needed. It is highly recommended that you do the following:

  • Limit the memory used by the crawl (full population) by setting max server memory to an appropriate value to prevent the crawl task from taking all available memory.

  • Increase the page file size if applicable.

Indexing performance is generally superior to SQL Server 2005. However, in certain situations there may be some performance degradation as follows:

  • Crawl on large documents (~>8KB). In non 64‑bit platforms, this might be slower in some scenarios.

  • varchar type data. This is because SQL Server 2005 does a better job with the varchar type than with a text type, whereas in the new full-text architecture there is no difference in theory between varchar data and text data.

  • Noninteger keys, especially GUID columns used as a full-text key

  • Too many unique keywords. It has been shown in random document crawl modes that performance can be slower than SQL Server 2005 (when uniquekeycount > 80 million).

Ref: SQL Server 2008 Full-Text Search: Internals and Enhancements