SQL Server Change Tracking – How to Detect Changes in a Table

change-trackingsql serversql-server-2012

In my application, with a DB running on SQL Server 2012, I've got a job (scheduled task) that periodically executes an expensive query and writes the results to a table that can later be queried by the application.

Ideally, I would like to run that expensive query only if something changed since the query has last executed. Since the source tables are very big, I cannot just select a checksum over all candidate columns or something like that.

I've got the following ideas:

Explicitly write a last changed timestamp, a "must be queries" flag, or something like this to a tracking table whenever I change something in a source table.
Use a trigger to do the same.

However, I'd really like to know whether there is a lightweight way to detect changes on a table without me explicitly tracking the writes. Can I, for example, get the "current" ROWVERSION of a table or something like that?

Best Answer

No, there isn't any. Any sort of 'last updated at' tracking would run into a severe performance problem as all updates, from all transactions, would attempt to update the one record tracking the 'last updated at'. This would effectively mean only one transaction can update the table at any moment, and all other transactions have to wait for the first one to commit. Complete Serialization. The number of admins/devs willing to put up with such performance penalty just for the benefit of knowing when the last update occurred is probably small.

So you are stranded to handle it via custom code. That means triggers since the alternative (detecting from log records) is a prerogative reserved only for transactional replication (or it's CDC alter-ego). Be aware that if you try to track it via a 'last updated at' column then you'll be facing exactly the serialization problem mentioned above. If update concurrency is important then you'd have to use a queue mechanism (trigger uses an INSERT and then a process aggregates the inserted values to formulate the 'last updated at'). Do not try to cheat with some 'clever' solution like sneaking at the current identity or looking up sys.dm_db_index_usage_stats. And also an 'updated_at' per-record column, like Rails timestamps have, is not working because it does not detect deletes...

Is there any 'lightweight' alternative? Actually there is one, but it is difficult to say whether it will work for you and is difficult to get it right: Query Notifications. Query Notification does exactly that, it will set up a notification if any data has changes and you need to refresh your query. Although most devs are familiar only with its .Net incarnation as SqlDependency, Query Notification can be used as a long lived, persisted mechanism to detect data change. Compared with true change tracking it is going to be really lightweight, and its semantics are closer to your needs (something, anything, changed, so you need to rerun the query).

But in the end, in your place, I would really reconsider my assumptions and go back to the drawing board. Perhaps you can use log shipping or replication to set up a reporting database, on a different server. What I read between the lines is that you're in need of a proper ETL pipe-line and an analytics data warehouse...

Related Solutions

Polling for new records in a table

Some of the implementation specifics depend on exactly what type of behaviour you need from this table (e.g. FIFO), but a typical design for a queue table uses DELETE with an OUTPUT clause:

DELETE TOP (n)
FROM QueueTable WITH (ROWLOCK, READPAST)
OUTPUT deleted.RecordID;

Or if ordered removal is required:

DELETE FROM
(
    SELECT TOP (n) RecordID
    FROM QueueTable WITH (ROWLOCK, READPAST)
    ORDER BY RecordID
) AS DF
OUTPUT deleted.RecordID;

If the records should remain in the table after processing, the table would have an extra columns to record the status (e.g. New, Processing, Processed) and probably the process id owning the row (if Processing or Processed). The de-queuing code is then modified to choose only records with the desired status. An alternative design is to DELETE rows and move them to a separate audit/archive table at the same time. As I mentioned, the specific solution depends on the precise requirements.

Sql-server – Automatically detect table name in MSSQL Server database using stored function

I think you would need to use a trigger on the table(s) in question in order to get the table name.

How I have implemented table change tracking is by just that: 1) Create a table to hold the changes and 2) Create a trigger on each table that needed to be tracked.

The triggers ALL look like this (where you replace 'XX_YOURTABLE_XX' with the table in question and [tr_XXXX] with a unique trigger name ):

CREATE TRIGGER [dbo].[tr_XXXX] on [dbo].[XX_YOURTABLE_XX] for INSERT, UPDATE, DELETE
AS

DECLARE 
    @bit INT ,
    @field INT ,
    @maxfield INT ,
    @char INT ,
    @fieldname VARCHAR(128) ,
    @TableName VARCHAR(128) ,
    @PKCols VARCHAR(1000) ,
    @sql VARCHAR(2000), 
    @UserName VARCHAR(128) ,
    @Type char(1) ,
    @PKSELECT VARCHAR(1000)

    SELECT @TableName = 'XX_YOURTABLE_XX'

    -- Get User
    IF object_id('tempdb..#TmpUser') IS NOT NULL
        SELECT  @UserName = TheUser FROM #TmpUser
    ELSE
        SELECT  @UserName = system_user

    -- Action
    IF EXISTS (SELECT * FROM INSERTED)
        IF EXISTS (SELECT * FROM DELETED)
            SELECT @Type = 'U'  --UPDATE
        ELSE
            SELECT @Type = 'I'  --INSERT
    ELSE
        SELECT @Type = 'D'      --DELETE

    -- get lISt of columns
    SELECT * INTo #ins FROM INSERTED
    SELECT * INTo #del FROM DELETED

    -- Get primary key columns for full outer join
    SELECT  @PKCols = coalesce(@PKCols + ' AND', ' on') + ' i.' + c.COLUMN_NAME + ' = d.' + c.COLUMN_NAME
    FROM    INFORMATION_SCHEMA.TABLE_CONSTRAINTS pk ,
            INFORMATION_SCHEMA.KEY_COLUMN_USAGE c
    WHERE   pk.TABLE_NAME = @TableName
            AND CONSTRAINT_TYPE = 'PRIMARY KEY'
            AND c.TABLE_NAME = pk.TABLE_NAME
            AND c.CONSTRAINT_NAME = pk.CONSTRAINT_NAME

    -- Get primary key SELECT for INSERT
    SELECT @PKSELECT = coalesce(@PKSELECT+'+','') + '''<' + COLUMN_NAME + '=''+convert(VARCHAR(100),coalesce(i.' + COLUMN_NAME +',d.' + COLUMN_NAME + '))+''>''' 
    FROM    INFORMATION_SCHEMA.TABLE_CONSTRAINTS pk ,
            INFORMATION_SCHEMA.KEY_COLUMN_USAGE c
    WHERE   pk.TABLE_NAME = @TableName
            AND CONSTRAINT_TYPE = 'PRIMARY KEY'
            AND c.TABLE_NAME = pk.TABLE_NAME
            AND c.CONSTRAINT_NAME = pk.CONSTRAINT_NAME

    IF @PKCols IS NULL
    BEGIN
        RAISERROR('no PK on table %s', 16, -1, @TableName)
        RETURN
    END

    SELECT @field = 0, @maxfield = max(ORDINAL_POSITION) FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = @TableName
    WHILE @field < @maxfield
    BEGIN
        SELECT @field = min(ORDINAL_POSITION) FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = @TableName AND ORDINAL_POSITION > @field
        SELECT @bit = (@field - 1 )% 8 + 1
        SELECT @bit = power(2,@bit - 1)
        SELECT @char = ((@field - 1) / 8) + 1
        IF substring(COLUMNS_UPDATED(),@char, 1) & @bit > 0 or @Type in ('I','D')
        BEGIN
            SELECT @fieldname = COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = @TableName AND ORDINAL_POSITION = @field
            SELECT @sql =       'INSERT tbl_TRACKING (Type, TableName, PK, FieldName, OldValue, NewValue, UserName)'
            SELECT @sql = @sql +    ' SELECT ''' + @Type + ''''
            SELECT @sql = @sql +    ',''' + @TableName + ''''
            SELECT @sql = @sql +    ',' + REPLACE(REPLACE(REPLACE(@PKSELECT,'<',''),'>',''),'PriKeyVal=','')
            SELECT @sql = @sql +    ',''' + @fieldname + ''''
            SELECT @sql = @sql +    ',convert(VARCHAR(1000),d.' + @fieldname + ')'
            SELECT @sql = @sql +    ',convert(VARCHAR(1000),i.' + @fieldname + ')'
            SELECT @sql = @sql +    ',''' + @UserName + ''''
            SELECT @sql = @sql +    ' FROM #ins i full outer join #del d'
            SELECT @sql = @sql +    @PKCols
            SELECT @sql = @sql +    ' WHERE i.' + @fieldname + ' <> d.' + @fieldname 
            SELECT @sql = @sql +    ' or (i.' + @fieldname + ' IS NULL AND  d.' + @fieldname + ' IS not NULL)' 
            SELECT @sql = @sql +    ' or (i.' + @fieldname + ' IS not NULL AND  d.' + @fieldname + ' IS NULL)' 
            exec (@sql)
        END
    END

GO

The tracking table looks like this:

CREATE TABLE [dbo].[tbl_TRACKING](
    [Type] [char](1) NULL,
    [TableName] [varchar](128) NULL,
    [PK] [varchar](1000) NULL,
    [FieldName] [varchar](128) NULL,
    [OldValue] [varchar](1000) NULL,
    [NewValue] [varchar](1000) NULL,
    [ActionDate] [datetimeoffset](3) NULL,
    [UserName] [varchar](128) NULL,
    [AppName] [varchar](128) NULL,
    [ComputerName] [varchar](128) NULL
)

Hope that helps

Best Answer

Related Solutions

Polling for new records in a table

Sql-server – Automatically detect table name in MSSQL Server database using stored function

Related Question