The company I work for is in the process of implementing change data capture on a Microsoft BI stack for the purpose of capturing incremental changes that need to be loaded into the data warehouse. So far this has worked great for us, we have the database replication setup and table level CDC setup on the replicated database for the specific tables and columns we need to track. The problem that we are running into is that both replication and CDC have the potential to break. When this happens any changes that occur from the time it breaks to the time it is resolved will not be captured and thus cannot be passed to the data warehouse.
I'm wondering what the best practice is to identify all the deltas and get them loaded into the data warehouse? In theory you can rebuild your entire data warehouse but then what is the best way to keep historical Type 2 data?
Best Answer
Assuming that there was a serious failure which causes you to need to 'reconstitute' the missing type 2 changes, you may be able to use your source database data and log backups. If you have preserved these backups you can do something like this process outline:
Note: If your database is using a rowversion (or timestamp) column then finding the changed rows will be simpler.
This is not a perfect process and will doubtless miss some details. But it would give you an approach to help you patch up the damage.
Having said that: It is up to you to decide whether this process is worth the effort.