HA SQL AlwaysOn backup using Copy Only + Logs from secondary replica

availability-groupsbackupresource-governortimeout

I'm hoping to elicit some guidance from your collective DBA experiences.

Here's our problem:

Full backups of our larger databases (around 3TB) are consuming too much resources such that we are suffering lengthy timeouts. It appears to be mostly I/O bound.

Full backups are done once a week, with regular 15min transaction log backups in between. We have to take full backups at some point.

Options we are considering:

  1. Upgrade from SQL 2016 Standard to 2016 Enterprise edition and configure the Resource Governor to limit I/O caused by backups. Basically choke the backups a little to allow servicing of application requests better.

  2. Go the HA (High Availability) route and deploy a secondary node with SQL AlwaysOn perform backups from the secondary replicas. This would relieve the primary replica of backup duties and remain available for application requests, potentially resolving our timeouts issue.

Pro's & Con's:

The cost of Enterprise edition is not trivial and while it offers numerous other benefits, it's a hard pill to swallow at this time.

For a similar or even lesser cost, we can deploy a second SQL 2016 Standard edition server and benefit from a HADR-based database deployment. This option is very appealing to us as we currently only have a cold standby for recovery.

We would likely favor the benefits from HADR over Enterprise, provided we can resolve the timeouts issue during backups.

Some Lab Findings:

For Enterprise edition, some labs tests show we can control the backup resources reasonably well, but it doesn't feel ideal. We'll need to tweak the numbers to the hardware to find the "sweet spot" for achieving the required level of restriction for backups without adversely slowing it down too much.

For AlwaysOn, we've done some lab tests and found that we cannot perform typical "Full" backups on the secondary replica, but we can make COPY_ONLY "Full" backups. This makes sense as it preserves the log chain of the primary replica.

We are able to make transaction log backups on the secondary replica which too is fine since that log chain is actually a "slave" from the primary and should not cause any harm when made on the secondary replica.

Have confirmed we are able to restore from the CopyOnlyBackup+TransactionLogs, so this could be a viable data recovery strategy.

Findings and Questions:

  • Has anyone had a similar issue and resolved it with either option, confirming viability of the option for us?

  • In the "CopyOnlyBackup+TransactionLogs from the secondary replica" scenario, any full backup on the primary replica would break the log chain and invalidate the CopyOnlyBackup of the secondary replica for subsequent transaction log backups. The remediation is to make another CopyOnlyBackup on the secondary replica, or to not make full backups on the primary.

  • What is the implication of not making normal full backups on the primary and only ever making COPY_ONLY full backups. Is this OK in the long term?

Any guidance would be greatly appreciated. Hopefully we not barking up the wrong tree.

Best Answer

Im actually using a similar setup with 1TB of the database.

  1. Has anyone had a similar issue and resolved it with either option, confirming viability of the option for us?

Im using backup on secondary only.Even in few outages, I tried to use copy only full backups and log backups to recover them.

No issues that using copy only backups.

The above statement will give a clarity for your 2 and 3'rd questions.