SQL Server Change Tracking – How to Detect Changes in a Table

change-trackingsql serversql-server-2012

In my application, with a DB running on SQL Server 2012, I've got a job (scheduled task) that periodically executes an expensive query and writes the results to a table that can later be queried by the application.

Ideally, I would like to run that expensive query only if something changed since the query has last executed. Since the source tables are very big, I cannot just select a checksum over all candidate columns or something like that.

I've got the following ideas:

  • Explicitly write a last changed timestamp, a "must be queries" flag, or something like this to a tracking table whenever I change something in a source table.
  • Use a trigger to do the same.

However, I'd really like to know whether there is a lightweight way to detect changes on a table without me explicitly tracking the writes. Can I, for example, get the "current" ROWVERSION of a table or something like that?

Best Answer

No, there isn't any. Any sort of 'last updated at' tracking would run into a severe performance problem as all updates, from all transactions, would attempt to update the one record tracking the 'last updated at'. This would effectively mean only one transaction can update the table at any moment, and all other transactions have to wait for the first one to commit. Complete Serialization. The number of admins/devs willing to put up with such performance penalty just for the benefit of knowing when the last update occurred is probably small.

So you are stranded to handle it via custom code. That means triggers since the alternative (detecting from log records) is a prerogative reserved only for transactional replication (or it's CDC alter-ego). Be aware that if you try to track it via a 'last updated at' column then you'll be facing exactly the serialization problem mentioned above. If update concurrency is important then you'd have to use a queue mechanism (trigger uses an INSERT and then a process aggregates the inserted values to formulate the 'last updated at'). Do not try to cheat with some 'clever' solution like sneaking at the current identity or looking up sys.dm_db_index_usage_stats. And also an 'updated_at' per-record column, like Rails timestamps have, is not working because it does not detect deletes...

Is there any 'lightweight' alternative? Actually there is one, but it is difficult to say whether it will work for you and is difficult to get it right: Query Notifications. Query Notification does exactly that, it will set up a notification if any data has changes and you need to refresh your query. Although most devs are familiar only with its .Net incarnation as SqlDependency, Query Notification can be used as a long lived, persisted mechanism to detect data change. Compared with true change tracking it is going to be really lightweight, and its semantics are closer to your needs (something, anything, changed, so you need to rerun the query).

But in the end, in your place, I would really reconsider my assumptions and go back to the drawing board. Perhaps you can use log shipping or replication to set up a reporting database, on a different server. What I read between the lines is that you're in need of a proper ETL pipe-line and an analytics data warehouse...