Sql-server – Full text index does not honour deleted words – They are still indexed

full-text-searchindexsql-server-2008sql-server-2008-r2

I have a full text index defined as 'Automatic' radio button checked on in its Catalog. When I modify a word in the full text indexed column, the new word appears in the sys.dm_fts_index_keywords but the old,updated value does not disappear and still shows in the list. It is also true when a word is deleted from the column as well. When I rebuild the catalog though the old (or deleted) word drops off the list. It looks like full text indexes does not honour deleted words. Why would that be?

Best Answer

In real world environments the situations where a keyword is encountered and then removed and will never be encountered again is quite low.

The process for updating a record in the full text index process probably removes all previous references to the record then updates all new references. As such when the references to a keyword are removed it does not know that keyword will no longer exist because it has not yet indexed the new document.

It takes a trivial but non-zero amount of time to insert new keywords (locks and similar) and it would affect index update performance to remove and reinsert these keywords all the time (especially since in most cases there are seldom too many orphaned words). But most of all I suspect the biggest reason is the impact such a change would have on index fragmentation.

There may be a way to remove the keyword from the index when it no longer is referenced but in the real world you're not going to see any noticeable performance decrease by keeping these orphaned keywords around. The only situation where you would see a significant % of keywords appearing and going away on record updates is where you're populating content with lots of random content - in such a situation your index performance is going to be very bad anyway because your keywords will cover so very small % of all documents.