So I just recently got to enable CDC on an always on AG. While many sources claimed enabling CDC on alwaysON AG is as easy as it is on any standalone server, turns out it really isn't. As MS says here:
You should create the [CDC capture & cleanup] jobs at all of the possible failover targets
before failover, and mark them as disabled until the availability
replica at a host becomes the new primary replica. The CDC jobs
running at the old primary database should be also disabled when the
local database becomes a secondary database. To disable and enable
jobs, use the @enabled option of sp_update_job (Transact-SQL). For
more information about creating CDC jobs, see sys.sp_cdc_add_job
(Transact-SQL).
Turns out after we enabled CDC on the primary replica, absolutely nothing happened on the secondary. So as per MS's advice, I did a manual failover, and created the jobs there, then did another failover back to the original primary.
And apparently as hinted at in the quote above, as fully compatible as CDC is supposed to be with alwaysOn AG's, it is entirely unable to determine whether it's currently running on the primary or secondary replica.
So, with the background out of the way, let's go to my question: How do you guarantee that the CDC jobs are always enabled on the current primary replica? And always disabled (so as not to produce errors) on the secondaries?
Since obviously a big point in alwaysOn AG's is that you don't need to manually be there the second the primary fails for whatever reason, because a secondary will pick up the slack and keep the system online, I assume it's then expected to just create a job on each of the replicas, constantly polling their status and changing the states of the CDC jobs as it does?
If this is correct, do you have any specific advice on how to go about this, specific scripts or guides etc? Because for what should obviously be a fairly common dilemma around the globe, it seems odd that I'd have to manually create a new TSQL job on each and every replica just to carry out what is essentially basic bare essentials functionality of CDC with alwaysOn AG.
This seems unlikely, which is why I'm asking. Or did I misunderstand something? Thanks!
Best Answer
The more common and accepted way is that once a job lands on a server in an AG you start editing it:
This has been explored in a few other SE questions (they don't specifically relate to CDC which is why you didn't find them):
How to run SQL Job only on primary replica in AlwaysOn Availability Groups?
SQL Server Agent Jobs and Availability Groups
The second way is to create a watchdog job using similar methods which will toggle the jobs on and off. This is probably more acceptable seeing as you're fiddling with SQL Server internal jobs. But it's up to you.
I wasn't quite happy with the samples above hardcoding AG names and such. I prefer to just have the job step operate within the database in question, and this way it can work out for itself if it even has a group and is part of it (as your jobs should continue running even if you temporarily remove them from an AG).
This kind of thing is also useful for SSIS jobs and other bits that pop up over time.
Going one step further you probably want to create a PowerShell script which will go through and check your servers from time to time for AGs which have CDC jobs and which don't have a watchdog, so that you will be notified to create one.