SSIS ETL vs CDC for Real-Time Data Warehouse Reporting

data-warehouseetlsql serversql-server-2012

I am currently working on SSIS package for continuous incremental data loads of the tables (40 tables now and expected to increase in near future) from OLTP system to a Reporting database residing on a different server.

The request is for real time data sync between the two sources.

Should I continue and test it with just regular SSIS -> Staging -> merge route OR should I look into implementing CDC. Will that be faster than plain old ETL considering the number of tables involved?

Best Answer

CDC doesn't replace the SSIS -> Staging -> Merge route, it just makes the SSIS part easier - the Extraction of ETL. CDC makes this much easier, it's one of the use cases it was designed for. For every change, you get a copy of the row before and after the change, and you can grab a hold of it and then clean it up.

It does have some gotcha's that aren't obvious before you start. 2 big changes that caught me out were:

  • Schema changes are harder on the source tables, particularly for deployment tools that drop and create tables instead of just amending
  • If you are using mirroring or Availability Groups, the cdc reader will pause when the replica goes offline (see: Change Data Capture not running when Mirror is Suspended)

It also doesn't provide any way to do a full re-sync of the data, it's only ever additive.

I don't think it's possible to answer the question should I do one or the other, the answer (as is often the case) is "It Depends". It's definitely worth investigating though.

Also note, CDC is only available in Enterprise Edition up to SQL 2016 SP1 (when it becomes available in all editions).