SQL Server 2017 – Understanding SQL Availability Groups

availability-groupssql-server-2017

I have a question in regards to SQL Standard availability groups. I currently have an AG setup with one primary and one secondary replica. I am using synchronous commit but was curious about asynchronous. How far will/can the secondary replica be behind? If the secondary replica goes offline for any reason will it catch completely up once it comes online? How long can the secondary be down before it will not catch up? I understand the difference between the 2 commit types (one verifies the transaction was committed before handing back to the application whereas the other commit type doesn't). What I am not sure of is how long the secondary can be offline in an asynchronous commit type.

Best Answer

How far will/can the secondary replica be behind?

That depends on a number of factors. Network speed between the 2 nodes, disk speed on the secondary, volume of data being transmitted. Many of these KPIs are in the AG Overview dashboard (Right-lick the AG in SSMS and select 'show dashboard'). You can select a slew of different metrics including log send queue, log redo queue, estimated recovery time, estimated data loss, etc. As stated in the comments below from scsimon, when in asynchronous, the secondary isn't every truly 'caught up'. It will always show a 'synchronizing..' status and not 'synchronized'.

If the secondary replica goes offline for any reason will it catch completely up once it comes online?

Yes. Also worth noting that during the time where your secondary replica is unavailable/disconnected, you will not be able to back up any transactions from the log that haven't been sent to the secondary. In in sys.databases the column log_reuse_Wait_desc will be populated with 'AVAILABILITY REPLICA' as transactions cannot be flushed out unless they are committed on the secondary.

How long can the secondary be down before it will not catch up?

The secondary won't just 'not catch up'. If you take the secondary down (without removing it from the AG), log backups won't flush transactions out, log files will fill up and then you'll run out of room, causing any future transactions to fail. The secondary server's amount of time that it can be down is largely dependent on how much space you have and what your transactional volume is. Once the secondary comes back up, it will begin applying transactions that occurred while it was down. I've seen this take from a few minutes to several hours.

For brief outages/patching, leaving the secondary connected is fine, but if you're looking at a long duration outage, it may be easier to just remove the secondary replica and not worry about the hassle of log files filling up. Then you just re-initialize the DBs into the AG once the outage completes.