Sql-server – CDC Capture job failure/CT table column changes after applying 2014 SP2+CU4

change-data-capturesql serversql server 2014

We had an issue with CDC where the capture job failed and wouldn't restart itself. It was in a DEV environment so no harm but has anyone seen these errors or know what exactly went wrong with CDC?

SQL 2014 EE 12.0.5540 – 2 node AG with SSISDB

enter image description here

Procedure or function sp_batchinsert_1663605265 has too many arguments
specified. For more information query the sys.dm_cdc_errors dynamic
management view

The Log-Scan Process failed to construct a replicated command from log
sequence number (LSN) {00002d0d:0000f11f:0002}. Back up the
publication database and contact Customer Support Services. For more
information

Log Scan process failed in processing log records. Refer to previous
errors in the current session to identify the cause and correct any
associated problems. For more information query the sys.dm_cdc_errors
dynamic management view

We attempted to drop and recreate the capture job using sp_cdc_drop_job abd sp_cdc_add_job. We also attempted failing over to the other node as well as server/service restarts. Nothing seemed to let CDC restart itself.

We applied CU4 on Friday through AG rolling patching so we are on the latest and greatest. This issue cropped up right after we patched and failed over.

Added Log query result

enter image description here

Best Answer

It looks like CU2+SP4 (12.0.5540) changes the structure of the underlying _CT system tables that get generated. It adds the column [__$command_id] int null.

When you upgrade in place to the new CU, the capture jobs will start failing or go to 'between retries...' with the following errors:

enter image description here enter image description here enter image description here

Dropping and recreating the capture job does not work (using cdc_add_job). I had to drop the capture agent CDC job, disable CDC (sp_cdc_disable_table) on the table and then re-enable CDC (sp_cdc_enable_table), which re-adds the job and successfully fires the job and begins collecting.

The issue is that, upon dropping and re-enabling CDC, you lose the previously existing _CT table and all of it's data.

The only information I ahve found about CU4 changing CDC is the following from MS: https://support.microsoft.com/en-us/help/3030352

It says that they fixed the ordering issue, but do not list any structural changes to the CDC system tables.

UPDATE 1: I've opened a paid support case with MS regarding this issue. There's also a connect item open for it: https://connect.microsoft.com/SQLServer/Feedback/Details/3130381

UPDATE 2: Per MS, using the command sp_vupgrade_replication after patching to CU4 will allow the capture agent job to start successfully. It still doesn't address the jobs failing before the command is run. It also does not address this added column not being documented anywhere. Still waiting to see what the final verdict is.

UPDATE 3: MS has officially posted to their blog and recognized this as an issue: https://blogs.msdn.microsoft.com/sql_server_team/cdc-functionality-may-break-after-upgrading-to-the-latest-cu-for-sql-server-2012-2014-and-2016/

No word yet on if they will be re-releasing the CU4 patch to not require manual intervention to restart the CDC jobs or officially document the column change.