Sql-server – Best practices for transferring data across long, wide tables

change-trackingdata-warehousesql serversql server 2014

As part of my data warehouse project, I'll be transferring data from our OLTPs to the Data Warehouse. Some of the tables are long and wide so obviously, I'll only be transferring required columns.

To reduce the overhead, I'm considering data tracking on these tables so we only look at changed values rather than scanning the complete table. Unfortunately our source system runs on SQL Server 2014 Standard Edition and CDC is an option we can not take advantage of.

Is there an alternative method that would be better than change tracking?

Best Answer

Have you considered adding a ''rowversion'' column to your source tables?

The simplest way to think about rowversion (formerly timestamp in SQL Server) is as a database-wide autonumber. Every INSERT and UPDATE increments the database-wide rowversion. Any table that has a rowversion column defined has that column's value set to the database rowversion when the associated row is INSERTed or UPDATEd.

Here's a brief algorithm for how you could use this:

  • Save the database-wide rowversion value (available via @@DBTS; i.e., DataBase TimeStamp)
  • Process your data into your Data Warehouse
  • Next time through, only process rows where the rowversion value is greater than the previously saved @@DBTS

Note that while you should not miss any rows that should be processed, you may re-process rows with no actual change in data. This is because the rowversion is incremented with every UPDATE statement, even if the data is "UPDATEd" to its current value and not actually changed.

Also, it won't help identify which columns might have been changed. As the name implies, it works strictly at the row level.