Do not replicate Delete operations AWS DMS

amazon-dmsawsreplication

I am using AWS DMS to replicate ongoing changes with SQL Server as a source and target endpoint. The tasks are running and replicating data with low latency. However, I need the tasks configured not to replicate DELETE operations from the source database.

I did read the whole documentation of the tool and I wasn't able to find such a configuration option to exclude DELETE statements for replicating. There is only one place in the whole user guide where it states that if one is performing a Full Load plus CDC or a CDC-only task, it is recommended the migration to be paused and secondary indexes to be created that will support filtering for update and delete statements. But no more, no further explanation.

Any help or experience is highly appreciated, please.

Best Answer

I've recently encountered a similar problem (my source endpoint is SQL Server and target endpoint is Redshift) and after a futile search through AWS documentation, I've contacted AWS support. They have confirmed that there is no direct way to filter out replication of DELETE statements while using a single DMS replication task.

However, they have suggested a workaround, which I think might also solve your issue.

Basically, you need to set up a migration task from SQL Server to S3, which serves as an intermediary step, and then from S3 to the final destination (which in your case is again the SQL Server).

Reason for this is that when you are setting up the target endpoint, you can specify an Extra Connection Attribute (ECA) to modify its behaviour. One of these ECAs can actually filter out all UPDATE and DELETE statements, but strangely, it is available only when creating an S3 target endpoint.

  1. Set up a migration task from SQL Server to an S3 bucket, which will serve as an intermediary. As a target endpoint, choose an S3 bucket and at the bottom of the endpoint setup screen at the 'Endpoint-specific settings', you specify an ECA cdcInsertsOnly=true; This way, only INSERT statements are written to the output file (either .csv or .parquet) during the CDC load. [1]

  2. Set up a second DMS task to migrate the now filtered data from S3 to the target endpoint of your choice. You will need to create a source endpoint for the S3 bucket from previous steps. You will also need to provide a JSON with external table definition so that DMS is able to replicate the data correctly. [2]

Of course, it is possible that this inability to filter out DELETE statements only applies to the Redshift target endpoint and there is some easier way to solve your problem. However, the evidence suggests that there is indeed no direct way since the ECA cdcInsertsOnly is available only in S3 target endpoint and not in SQL Server nor Redshift.