Sql-server – CDC with AlwaysOn Availability Groups

availability-groupschange-data-capturesql server

So I just recently got to enable CDC on an always on AG. While many sources claimed enabling CDC on alwaysON AG is as easy as it is on any standalone server, turns out it really isn't. As MS says here:

You should create the [CDC capture & cleanup] jobs at all of the possible failover targets
before failover, and mark them as disabled until the availability
replica at a host becomes the new primary replica. The CDC jobs
running at the old primary database should be also disabled when the
local database becomes a secondary database. To disable and enable
jobs, use the @enabled option of sp_update_job (Transact-SQL). For
more information about creating CDC jobs, see sys.sp_cdc_add_job
(Transact-SQL).

Turns out after we enabled CDC on the primary replica, absolutely nothing happened on the secondary. So as per MS's advice, I did a manual failover, and created the jobs there, then did another failover back to the original primary.

And apparently as hinted at in the quote above, as fully compatible as CDC is supposed to be with alwaysOn AG's, it is entirely unable to determine whether it's currently running on the primary or secondary replica.

So, with the background out of the way, let's go to my question: How do you guarantee that the CDC jobs are always enabled on the current primary replica? And always disabled (so as not to produce errors) on the secondaries?

Since obviously a big point in alwaysOn AG's is that you don't need to manually be there the second the primary fails for whatever reason, because a secondary will pick up the slack and keep the system online, I assume it's then expected to just create a job on each of the replicas, constantly polling their status and changing the states of the CDC jobs as it does?

If this is correct, do you have any specific advice on how to go about this, specific scripts or guides etc? Because for what should obviously be a fairly common dilemma around the globe, it seems odd that I'd have to manually create a new TSQL job on each and every replica just to carry out what is essentially basic bare essentials functionality of CDC with alwaysOn AG.

This seems unlikely, which is why I'm asking. Or did I misunderstand something? Thanks!

Best Answer

The more common and accepted way is that once a job lands on a server in an AG you start editing it:

  • Add a first job step that checks whether this is the primary.
  • Exit on success if it isn't.
  • Otherwise continue onto the normal job step.

This has been explored in a few other SE questions (they don't specifically relate to CDC which is why you didn't find them):

The second way is to create a watchdog job using similar methods which will toggle the jobs on and off. This is probably more acceptable seeing as you're fiddling with SQL Server internal jobs. But it's up to you.

I wasn't quite happy with the samples above hardcoding AG names and such. I prefer to just have the job step operate within the database in question, and this way it can work out for itself if it even has a group and is part of it (as your jobs should continue running even if you temporarily remove them from an AG).

Set Nocount On

If  Exists (
    -- Database in an AG
    Select  Top 1 0
    From    sys.availability_databases_cluster adc
    Join    sys.availability_groups ag
    On      adc.group_id = ag.group_id
    Join    sys.dm_hadr_availability_group_states dhags
    On      ag.group_id = dhags.group_id
    Where   adc.database_name = Db_Name()
    )
And Not Exists (
    -- Database in an AG which is Primary on this instance
    Select  Top 1 0
    From    sys.availability_databases_cluster adc
    Join    sys.availability_groups ag
    On      adc.group_id = ag.group_id
    Join    sys.dm_hadr_availability_group_states dhags
    On      ag.group_id = dhags.group_id
    Where   adc.database_name =  Db_Name()
    And     Upper(dhags.primary_replica) = Upper(@@Servername)
    )
Begin
Raiserror('This is not the primary replica.', 16, 1) With Nowait
Return
End

-- Check if the database isn't accessible
If  Not Exists (
    Select  Top 1 0
    From    sys.databases
    Where   databases.name = Db_Name()
    And     state_desc = 'ONLINE'
    )
Begin
    Raiserror('The database doesn''t exist or isn''t ONLINE on this node.', 16, 1) With Nowait
    Return
End

This kind of thing is also useful for SSIS jobs and other bits that pop up over time.

Going one step further you probably want to create a PowerShell script which will go through and check your servers from time to time for AGs which have CDC jobs and which don't have a watchdog, so that you will be notified to create one.