I was running some simple tests on a publisher and subscriber on the same instance:
- I inserted 50,000 rows into a published article and the data was
correctly pushed to the subscriber. - I inadvertently deleted the 49,985th record on the subscriber (but
didn't realize it at the time) - I deleted 50,000 rows from the published article
As I was watching the size of the subscriber table, I noticed it would drop down in row count almost to zero and then go back to 50,000. Deletes would start running again, the row count would drop, and then it back to 50,000. This kept happening over and over.
I ran a standard trace and saw all the deletes running normally. After it was almost complete, this statement appeared:
IF @@TRANCOUNT > 0 ROLLBACK
Then I modified the trace to incude error messages and saw this:
The row was not found at the Subscriber when applying the replicated command.
So it looks like when the distribution agent is executing all the deletes one-by-one, the process is still included in a transaction (the original DELETE was one line). If there is an error, it performs a ROLLBACK and then starts over.
My questions:
- Is there a standard way to break out of the loop?
- Would it be acceptable to recreate the missing row so the process
can complete? - What do you do if this happens on production? I guess that's not
really a question. But this seems like a simple error could have
some pretty serious complications unless replication is monitored
very carefully.
Best Answer
The error is caused by the code in the custom delete stored procedure for each article:
If no row is deleted, a special procedure is called and the transaction is rolled back.
In order to avoid this, I've created a post-Snapshot script that alters the all delete / update replication stored procedures and logs the error to a table instead.