What guidelines should be considered for maintaining full-text indexes?
Should I REBUILD or REORGANIZE the full-text catalog (see BOL)? What is a reasonable maintenance cadence? What heuristics (similar to the 10% and 30% fragmentation thresholds) could be used to determine when maintenance is needed?
(Everything below is simply extra information elaborating on the question and showing what I've thought about so far.)
Extra info: my initial research
There are a lot of resources on b-tree index maintenance (e.g., this question, Ola Hallengren's scripts, and numerous blog posts on the subject from other sites). However, I have found that none of these resources provide recommendations or scripts for maintaining fulltext indexes.
There is Microsoft documentation that mentions that defragmenting the b-tree index of the base table and then performing a REORGANIZE on the full-text catalog may improve performance, but it doesn't touch on any more specific recommendations.
I also found this question, but it is primarily focused on change-tracking (how data updates to the underlying table are propagated into the fulltext index) and not the type of regularly scheduled maintenance that could maximize the efficiency of the index.
Extra info: basic performance testing
This SQL Fiddle contains code that can be used to create a full-text index with AUTO
change tracking and examine both the size and query performance of the index as data in the table is modified. When I run the script's logic on a copy of my production data (as opposed to the artificially manufactured data in the fiddle), here is a summary of the results I am seeing after each data modification step:
Even though the update statements in this script were fairly contrived, this data seems to show that there is a lot to be gained by regular maintenance.
Extra info: Initial ideas
I'm thinking about creating a nightly or weekly task. It seems that this task could perform either a REBUILD or REORGANIZE.
Because the full-text indexes may be quite large (tens or hundreds of millions of rows), I'd like to be able to detect when indexes within the catalog are sufficiently fragmented that a REBUILD/REORGANIZE is warranted. I'm a little bit unclear on what heuristics might make sense for that.
Best Answer
I wasn't able to find any good resources online, so I did some more hands-on research and thought it would be useful to post the resulting full-text maintenance plan we are implementing based on that research.
Our heuristic to determine when maintenance is needed
Our primary goal is to retain consistent full-text query performance as data evolves in the underlying tables. However, for various reasons it would be difficult for us to launch a representative suite of full-text queries against each of our databases each night and use the performance of those queries to determine when maintenance is needed. Therefore, we were looking to create rules of thumb that can be computed very quickly and used as a heuristic to indicate that full-text index maintenance may be warranted.
In the course of this exploration, we found that the system catalog provides a lot of information about how any given full-text index is divided into fragments. However, there is no official "fragmentation %" computed (as there is for b-tree indexes via sys.dm_db_index_physical_stats). Based on the full-text fragment information, we decided to compute our own "full-text fragmentation %". We then used a dev server to repeatedly make random updates of anywhere between 100 and 25,000 rows at a time to a 10 million row copy of production data, record the full-text fragmentation, and perform a benchmark full-text query using
CONTAINSTABLE
.The results, as seen in the charts above and below, were very illuminating and showed the the fragmentation measure we had created is very highly correlated to the observed performance. Since this also ties out with our qualitative observations in production, this is enough that we're comfortable using the fragmentation % as our heuristic for deciding when our full-text indexes need maintenance.
The maintenance plan
We have decided to use the following code to compute a fragmentation % for each full-text index. Any full-text indexes of non-trivial size with fragmentation of at least 10% will be flagged to be re-built by our over-night maintenance.
These queries yield results like the following, and in this case rows 1, 6, and 9 would be marked as being too fragmented for optimal performance because the full-text index is over 1MB and at least 10% fragmented.
Maintenance cadence
We already have a nightly maintenance window, and the fragmentation calculation is very cheap to compute. So we will be running this check each night and then only performing the more expensive operation of actually rebuilding a full-text index when necessary based on the 10% fragmentation threshold.
REBUILD vs. REORGANIZE vs. DROP/CREATE
SQL Server offers
REBUILD
andREORGANIZE
options, but they are available only for a full-text catalog (which may contain any number of full-text indexes) in its entirety. For legacy reasons, we have a single full-text catalog that contains all of our full-text indexes. Therefore, we have opted to drop (DROP FULLTEXT INDEX
) and then re-create (CREATE FULLTEXT INDEX
) on an individual full-text index level instead.It might be more ideal to break full-text indexes into separate catalogs in a logical way and perform a
REBUILD
instead, but the drop/create solution will work for us in the meantime.