Downsides of Not Deleting CDC Records in SQL Server Change Tables

change-data-capturereplicationsql server

If I do something like this:

USE AdventureWorks2012;
GO  
EXECUTE sys.sp_cdc_change_job   
    @job_type = N'cleanup',  
    @retention = 52494800; -- (100 years)
GO 

The records in the change tables will never (until year 2119) be deleted.

Also, I'm applying a script like this to move data from an old instance to a new one every time a new column is added to the tracked table, and I know that if there are many records, it can take a little bit of time to copy them to the new change table.

So, I'm planning to keep the change history forever on these change tables, and never purge the tables using the cleanup job. But the documentation says:

Data that is deposited in change tables will grow unmanageably if you
do not periodically and systematically prune the data

If CDC had been in plance since 2007 in my DB, the biggest common change table will contain ~20K rows in 12 years, the source table is replicated to another server, so, having said that:

  • What does 'grow unmanageably' means for me?
  • Will it have vey negative impact on the performance or storage in the DB? Maybe an impact on Replication performance?
  • Am I just fine doing CDC with no cleanup job at all? or
  • Is it a better alternative to copy the data to a reporting DB using a SSIS package or a job runing a stored procedure?
    • If so, is there a simple way to handle the creation of new columns in the destination table based on new columns added in the change tables?
  • Does the answer change if instead of 20K rows in one change table there are 1 million?

Best Answer

What does 'grow unmanageable' means for me?

This all depends on your environment. If you had a DB that was subject to a lot of transactions (one with a lot of CRUD operations), then this table would grow much faster and larger than one with minimal changes making it harder to manage.

Will it have very negative impact on the performance or storage in the DB? Maybe an impact on Replication performance?

The larger your change tables grow, the longer your ETL / change data query functions that are used to store the data on another instance would take to run (they would be slower). You haven't stated of you are storing these records elsewhere, or just using the change tables as your end point for auditing, but it seems like the latter.

Am I just fine doing CDC with no cleanup job at all?

If your change table isn't expected to grow > 20K rows ever you probably wouldn't see a performance burden.

Is it a better alternative to copy the data to a reporting DB using a SSIS package or a job running a stored procedure? If so, is there a simple way to handle the creation of new columns in the destination table based on new columns added in the change tables?

Based on this question, and the fact that you state you already have a script that is copying data to another server, it seems like you are trying to capture changes multiple ways. One way you may want to explore is System-Versioned Temporal Tables. This may be the easiest route for you and could eliminate CDC and your SSIS / custom script especially since your DML events seem to be low. These tables will also change when you conduct DDL changes to your source table.

Does the answer change if instead of 20K rows in one change table there are 1 million?

Since that would be 50X the number of rows, sure, you can expect different results regarding performance of any queries against those change tables.