Sql-server – Validating SQL Server Change Tracking Synchronization

change-trackingdata validationsql servert-sql

I have some tables that I need to synchronize from one SQL Server database to another. I'm using SQL Server's "Change Tracking" feature. My program uses Change Tracking to synchronize the latest changes and then checks to verify that the changes were synchronized correctly.

The way that it goes about this is:

  1. It fetches the current change tracking version at the source together with some statistics about the source table that will later be used to verify that things synched correctly. The query looks like:
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ; 
BEGIN TRANSACTION; 
WITH T AS (
    SELECT  [NumericValue] AS [NumericValue]
    FROM [SourceTable]
) 
SELECT 
    AVG(CONVERT([NumericValue])) AS Avg, 
    SUM(CONVERT([NumericValue])) AS Sum,
    COUNT(*) AS Count, 
    CHANGE_TRACKING_CURRENT_VERSION() AS CTVersion 
FROM T; 
COMMIT TRANSACTION;
  1. It fetches all changes between the last version it synchronized to and the source version fetched in step 1. The query to fetch the changes in the desired version range looks like:
SELECT SYS_CHANGE_OPERATION, SYS_CHANGE_VERSION, [SYSKEY_PrimaryKey], [PrimaryKey], [NumericValue]
FROM (
    SELECT
        T.*,
        CT.SYS_CHANGE_OPERATION,
        CASE WHEN CT.SYS_CHANGE_OPERATION = 'I' THEN CT.SYS_CHANGE_CREATION_VERSION ELSE CT.SYS_CHANGE_VERSION END AS SYS_CHANGE_VERSION,
        [CT].[PrimaryKey] AS [SYSKEY_PrimaryKey]
    FROM CHANGETABLE(CHANGES [SourceTable], @startVersion) AS CT
        LEFT JOIN [SourceTable] AS T ON T.[PrimaryKey] = CT.[PrimaryKey]
    
    WHERE CT.SYS_CHANGE_OPERATION = 'D' 
        OR (T.[PrimaryKey] = CT.[PrimaryKey]) 
        AND 
            CASE 
                WHEN CT.SYS_CHANGE_OPERATION = 'I' THEN 
                    CT.SYS_CHANGE_CREATION_VERSION 
                ELSE CT.SYS_CHANGE_VERSION 
            END <= @endVersion
) AS [SourceTable]
  1. It gets the same statistics from the destination table that it did in step 1. The query looks like:
SELECT 
    AVG(CONVERT([NumericValue])) AS Avg, 
    SUM(CONVERT([NumericValue])) AS Sum,
    COUNT(*) AS Count
FROM [DestinationTable] 
  1. It compares the statistics from steps 1 and 2. If they match then all is good, if not then it knows there was a problem with the synchronization

The problem I'm having is that it is intermittently detecting that the values don't match on tables that experience very rapid updates. Investigating, I have found no evidence of anything actually going wrong in the synchronization process, but the destination seems to lag slightly behind where it needs to be to match the source. When the source table updates slow down, the destination always catches up and matches the source. It seems at this point that the problem stems from the version fetched in step 1 not actually matching the statistics that are fetched at the same time. This seems to indicate that the increment of the change tracking version is not performed atomically together with the actual changes to the table. Unfortunately I haven't found anything in the documentation that explicitly confirms or denies this theory, which makes it difficult to know whether there's something else I need to be looking for. Additionally, if it turns out to be true that the change tracking version and the table state aren't guaranteed to be consistent then I don't know how it would be possible for me to accurately perform validation after synchronization.

Any help clarifying whether it's expected for the version number to lag behind the actual state of the table, or suggestions of techniques I could use to get around this problem would be much appreciated.

Best Answer

If there was a delete after your initial statistics capture, your fetch all query is going to capture that. If you apply that delete in your sync, it will technically be ahead of your capture point, which would product different values.

While this is possible to occur under a low volume of changes, it would be much more likely under a high volume.

I believe you want to evaluate the OR before AND, so should wrap those in (). Also, I think you can just test against SYS_CHANGE_VERSION as I believe it will always have the same values a SYS_CHANGE_CREATION_VERSION for (I)nserts:

SELECT SYS_CHANGE_OPERATION, SYS_CHANGE_VERSION, [SYSKEY_PrimaryKey], [PrimaryKey], [NumericValue]
FROM (
    SELECT
        T.*,
        CT.SYS_CHANGE_OPERATION,
        CASE WHEN CT.SYS_CHANGE_OPERATION = 'I' THEN CT.SYS_CHANGE_CREATION_VERSION ELSE CT.SYS_CHANGE_VERSION END AS SYS_CHANGE_VERSION,
        [CT].[PrimaryKey] AS [SYSKEY_PrimaryKey]
    FROM CHANGETABLE(CHANGES [SourceTable], @startVersion) AS CT
        LEFT JOIN [SourceTable] AS T ON T.[PrimaryKey] = CT.[PrimaryKey]
    
    WHERE (CT.SYS_CHANGE_OPERATION = 'D' OR (T.[PrimaryKey] = CT.[PrimaryKey])) 
        AND SYS_CHANGE_VERSION <= @endVersion
) AS [SourceTable]