AWS DMS – Resolving 2 Million CDCIncomingChanges Without Target Throughput

amazon-dmsreplicationsql server

We're attempting to use AWS DMS to replicate MS-SQL to a customer's on-prem DB, and running into a confusing situation. At 13:00 UTC there are over 2 million CDCIncomingChanges, with CDCTargetLatency of 60k. Somehow, by 14:50 UTC these changes are processed and latency is resolved down to zero while CDCThroughputRowsTarget stays at zero. It would seem that somehow there are ~2 million incoming changes from the source DB, which are essentially stuck for 60,000 seconds (16 hours), yet they all amount to no rows being changed on the target.

We've increased the capacity of our replication instance and have tried increasing various constraints in our Task Settings. I've read all the AWS DMS docs including their three-part "Debugging DMS Migrations" series and I've looked through the logs with maximum debug level, yet I still don't get what is happening here.

In our task settings, we have BatchApplyEnabled:true and BatchApplyPreserveTransaction:true, so my only idea is that this could be caused by a long-running transaction. In general though, I don't know how to go about debugging this, or determining the root cause. Any help is appreciated.

Task Status Dashboard:
Task Status dashboard

Replication Instance Dashboard:
Replication Instance dashboard

Best Answer

It would seem that every row in a DB was effectively being deleted and recreated without any changes, and that batchApply was enabled. Batch apply would effectively cancel out large numbers of changes, which is why the number of CDCIncomingChanges was reduced without throughput.

Related Question