Sql-server – How to make SQL Server Agent job history available after failing over an AG

availability-groupssql serversql-server-agent

I am implementing a new HA/DR architecture using Always On AGs for use in both a HA a DR scenario.

I have mastered a mechanism to ensure that jobs are disabled on the secondary replica and enabled when a failover takes place but now realise that, once this happens, no job history will be available. Job history is essential to determine what hadn't run or failed on the now defunct primary and needs to be run on the new primary.

I am struggling to find any documentation about how to overcome this and make that data available on the new primary.

Has anyone been confronted with this and how did you get around it?

Best Answer

I have mastered a mechanism to ensure that jobs are disabled on the secondary replica and enabled when a failover takes place […]

Hopefully that's using sys.fn_hadr_is_primary_replica in the job/jobstep and not actually enabling and disabling jobs.

[…] but now realise that, once this happens, no job history will be available.

You are correct. Job history is stored in MSDB which is a system database, by default on SQL Server 2017 and below system databases cannot be part of an availability group. This means the job history will only be local.

The only way, currently, to change that is to either write your own code to insert the job history into a user database (such as a dba tools database) or run the jobs from their own independent job server (which may be some enterprise job scheduler). There is nothing built in to do this automatically.

Job history is essential to determine what hadn't run or failed on the now defunct primary and needs to be run on the new primary.

I would suggest building your own job logging table and writing the necessary logging as part of the jobsteps in each job so that you can know. This isn't just an AG thing, it's a where should I start and stop my jobs based on what I've already done thing and that's the root issue. Creating your own logging in a user database will solve this problem, though it's not the only way (such as using job scheduling software on another server).



Edit: I debated whether or not to put this in because while it is officially announced, all of the details behind the feature aren't public and are subject to change. However, in SQL Server 2019 (which isn't out yet, CTP 2.3 as of this writing) there are System AGs which do have master and model as part of the availability group. Since this functionality isn't currently available, the public documentation isn't available, and some items are subject to change, I hate to put this as an option but included it for completeness.