Sql-server – Capturing change deltas when database replication or CDC breaks

change-data-capturereplicationsql server

The company I work for is in the process of implementing change data capture on a Microsoft BI stack for the purpose of capturing incremental changes that need to be loaded into the data warehouse. So far this has worked great for us, we have the database replication setup and table level CDC setup on the replicated database for the specific tables and columns we need to track. The problem that we are running into is that both replication and CDC have the potential to break. When this happens any changes that occur from the time it breaks to the time it is resolved will not be captured and thus cannot be passed to the data warehouse.

I'm wondering what the best practice is to identify all the deltas and get them loaded into the data warehouse? In theory you can rebuild your entire data warehouse but then what is the best way to keep historical Type 2 data?

Best Answer

Assuming that there was a serious failure which causes you to need to 'reconstitute' the missing type 2 changes, you may be able to use your source database data and log backups. If you have preserved these backups you can do something like this process outline:

  1. Restore a copy of the source database to a point just prior to when you experienced the failure. Name the database [dbnameEmergency].
  2. Restore the logs incrementally to [dbnameEmergency], such as 1 minute, 5 minutes, 30 minutes, according to how finely you want to process the deltas in order to reduce the damage.
  3. Reading and comparing the incremental restores will require either creating snapshots, if you have Enterprise edition, or restoring the log to STANDBY mode so as to read the data.
  4. Compare the latest readable changes in [dbnameEmergency] against the state of the [dbnameReplica] (which is the database you are trying to set straight) and write code to apply the changes you are able to infer to [dbnameReplica].

Note: If your database is using a rowversion (or timestamp) column then finding the changed rows will be simpler.

This is not a perfect process and will doubtless miss some details. But it would give you an approach to help you patch up the damage.

Having said that: It is up to you to decide whether this process is worth the effort.