Sql-server – Exact behaviour of CHANGE_TRACKING AUTO

full-text-searchsql server

I am trying to understand population options for SQl server's full text indexes. My target is to access whether our currently used option (default = CHANGE_TRACKING AUTO) is the best to use in our case.

I think I have understood perfectly every bit of information in the related MS reference page, except from this devilish line from the subchapter Population based on change tracking/Enable change tracking/Automatic population:

…the tracked changes are propagated automatically. The full-text
index is updated in the background, however, so propagated changes
might not be reflected immediately in the index.

Nothing in the following reference, however, gives any information about what exactly not immediately means in regards to the exact time the population happens. I have found around queries that can return the last timestamp the population took place; but how can I have an idea of how often and when the system tries to, or indeed executes, the population?

Best Answer

concerning "The full-text index is updated in the background, however, so propagated changes might not be reflected immediately in the index." from what I can tell and experience it means the change to the full-text is updated as any other clustered index, using a background task and not on the immediate instant that the data was changed and the update was triggered.

Also reading a bit more on the documentation (create fulltext index):

"CHANGE_TRACKING [ = ] { MANUAL | AUTO | OFF [ , NO POPULATION ] } Specifies whether changes (updates, deletes or inserts) made to table columns that are covered by the full-text index will be propagated by SQL Server to the full-text index. Data changes through WRITETEXT and UPDATETEXT are not reflected in the full-text index, and are not picked up with change tracking."

This is for the manual option meaning that for the AUTO the reverse can be applied, the change is done by the change tracking event and not immediate.

And here is why (Population based on change tracking):

"Optionally, you can use change tracking to maintain a full-text index after its initial full population. There is a small overhead associated with change tracking because SQL Server maintains a table in which it tracks changes to the base table since the last population. When you use change tracking, SQL Server maintains a record of the rows in the base table or indexed view that have been modified by updates, deletes, or inserts. Data changes made through WRITETEXT and UPDATETEXT are not reflected in the full-text index, and are not picked up with change tracking."

Related Solutions

Sql-server – Guidelines for full-text index maintenance

I wasn't able to find any good resources online, so I did some more hands-on research and thought it would be useful to post the resulting full-text maintenance plan we are implementing based on that research.

Our heuristic to determine when maintenance is needed

Our primary goal is to retain consistent full-text query performance as data evolves in the underlying tables. However, for various reasons it would be difficult for us to launch a representative suite of full-text queries against each of our databases each night and use the performance of those queries to determine when maintenance is needed. Therefore, we were looking to create rules of thumb that can be computed very quickly and used as a heuristic to indicate that full-text index maintenance may be warranted.

In the course of this exploration, we found that the system catalog provides a lot of information about how any given full-text index is divided into fragments. However, there is no official "fragmentation %" computed (as there is for b-tree indexes via sys.dm_db_index_physical_stats). Based on the full-text fragment information, we decided to compute our own "full-text fragmentation %". We then used a dev server to repeatedly make random updates of anywhere between 100 and 25,000 rows at a time to a 10 million row copy of production data, record the full-text fragmentation, and perform a benchmark full-text query using CONTAINSTABLE.

The results, as seen in the charts above and below, were very illuminating and showed the the fragmentation measure we had created is very highly correlated to the observed performance. Since this also ties out with our qualitative observations in production, this is enough that we're comfortable using the fragmentation % as our heuristic for deciding when our full-text indexes need maintenance.

The maintenance plan

We have decided to use the following code to compute a fragmentation % for each full-text index. Any full-text indexes of non-trivial size with fragmentation of at least 10% will be flagged to be re-built by our over-night maintenance.

-- Compute fragmentation information for all full-text indexes on the database
SELECT c.fulltext_catalog_id, c.name AS fulltext_catalog_name, i.change_tracking_state,
    i.object_id, OBJECT_SCHEMA_NAME(i.object_id) + '.' + OBJECT_NAME(i.object_id) AS object_name,
    f.num_fragments, f.fulltext_mb, f.largest_fragment_mb,
    100.0 * (f.fulltext_mb - f.largest_fragment_mb) / NULLIF(f.fulltext_mb, 0) AS fulltext_fragmentation_in_percent
INTO #fulltextFragmentationDetails
FROM sys.fulltext_catalogs c
JOIN sys.fulltext_indexes i
    ON i.fulltext_catalog_id = c.fulltext_catalog_id
JOIN (
    -- Compute fragment data for each table with a full-text index
    SELECT table_id,
        COUNT(*) AS num_fragments,
        CONVERT(DECIMAL(9,2), SUM(data_size/(1024.*1024.))) AS fulltext_mb,
        CONVERT(DECIMAL(9,2), MAX(data_size/(1024.*1024.))) AS largest_fragment_mb
    FROM sys.fulltext_index_fragments
    GROUP BY table_id
) f
    ON f.table_id = i.object_id

-- Apply a basic heuristic to determine any full-text indexes that are "too fragmented"
-- We have chosen the 10% threshold based on performance benchmarking on our own data
-- Our over-night maintenance will then drop and re-create any such indexes
SELECT *
FROM #fulltextFragmentationDetails
WHERE fulltext_fragmentation_in_percent >= 10
    AND fulltext_mb >= 1 -- No need to bother with indexes of trivial size

These queries yield results like the following, and in this case rows 1, 6, and 9 would be marked as being too fragmented for optimal performance because the full-text index is over 1MB and at least 10% fragmented.

Maintenance cadence

We already have a nightly maintenance window, and the fragmentation calculation is very cheap to compute. So we will be running this check each night and then only performing the more expensive operation of actually rebuilding a full-text index when necessary based on the 10% fragmentation threshold.

REBUILD vs. REORGANIZE vs. DROP/CREATE

SQL Server offers REBUILD and REORGANIZE options, but they are available only for a full-text catalog (which may contain any number of full-text indexes) in its entirety. For legacy reasons, we have a single full-text catalog that contains all of our full-text indexes. Therefore, we have opted to drop (DROP FULLTEXT INDEX) and then re-create (CREATE FULLTEXT INDEX) on an individual full-text index level instead.

It might be more ideal to break full-text indexes into separate catalogs in a logical way and perform a REBUILD instead, but the drop/create solution will work for us in the meantime.

Sql-server – Why is full text index falling behind

There seems to be a few queries you can try to get more information out of what is happening. The below query that was pulled from Stackoverflow takes advantage of FULLTEXTCATALOGPROPERTY.

DECLARE @CatalogName VARCHAR(MAX)
SET     @CatalogName = 'FTS_Demo_Catalog'

SELECT
    DATEADD(ss, FULLTEXTCATALOGPROPERTY(@CatalogName,'PopulateCompletionAge'), '1/1/1990') AS LastPopulated
    ,(SELECT CASE FULLTEXTCATALOGPROPERTY(@CatalogName,'PopulateStatus')
        WHEN 0 THEN 'Idle'
        WHEN 1 THEN 'Full Population In Progress'
        WHEN 2 THEN 'Paused'
        WHEN 3 THEN 'Throttled'
        WHEN 4 THEN 'Recovering'
        WHEN 5 THEN 'Shutdown'
        WHEN 6 THEN 'Incremental Population In Progress'
        WHEN 7 THEN 'Building Index'
        WHEN 8 THEN 'Disk Full.  Paused'
        WHEN 9 THEN 'Change Tracking' END) AS PopulateStatus
FROM sys.fulltext_catalogs AS cat

However, In this article, microsoft mentions

it is usually a better option to check the corresponding PopulateStatus property at the table level, TableFullTextPopulateStatus in the OBJECTPROPERTYEX system function. This and other new full-text properties in OBJECTPROPERTYEX provide more granular information about full-text indexing tables.

To this, I would use the following script:

DECLARE @TableId VARCHAR(MAX)
SET @TableId = OBJECT_ID('gGastroversion')

Select CASE OBJECTPROPERTYEX ( @TableId , 'TableFullTextPopulateStatus' )
            WHEN 0 THEN 'Idle'
            WHEN 1 THEN 'Full Population In Progress'
            WHEN 2 THEN 'Full population is in progress.'
            WHEN 3 THEN 'Propagation of tracked changes is in progress.'
            WHEN 4 THEN 'Background update index is in progress, such as autochange tracking.'
            WHEN 5 THEN 'Full-text indexing is throttled or paused.'
            WHEN 6 THEN 'An error has occurred. Examine the crawl log for details'
        END

Notice, that if you have number 6 as a result it will tell you to check the crawl log. Troubleshooting this information can be found here.

Best Answer

Related Solutions

Sql-server – Guidelines for full-text index maintenance

Sql-server – Why is full text index falling behind

Related Question