Can someone please confirm if this is correct?
Yes, you are 100% correct!
Is there any way we can mitigate this, ie. ensuring there's no index maintenance during this window or switching to aynschronous commit mode?
The mitigation is not with the mode that the replicas run it, it's removing that replica from the AG.
Does this pertain to my situation as described above and if so does this mean that the potential for the transaction to fill is mitigated by design?
Yes, that's what will happen if you change nothing. No, the transaction log will still fill up. Regardless of synchronous or asynchronous modes the primary must keep all log blocks that are required for the replica that is the furthest behind.
In your scenario, there are only two replicas a primary and a secondary. The secondary will be offline for a while. You've already correctly stated what will happen, the send queue will get larger as the replica is offline and stays that way. This happens no matter the mode.
The mitigation is to remove the replica that will be down for the extended period of time. This allows the primary to properly truncate the log with proper transaction log backups.
Once the secondary is back up, since it was kicked out of the AG all of the databases participating in that AG will be in a restoring state. To get back to a good steady state and bring the replica back into the AG, take the log backups that were run during the period the secondary replica was offline and apply them to the databases, leaving them in a restoring state (with no recovery). Once you get close, pause all transaction log backups on the primary and join the replica back into the AG. Resume the log backups on the primary.
Due to hardware maintenance the IP address of the secondary node will change.
This shouldn't be a problem if the node ip itself is changing. It's only a slight issue if cluster resource ips are changing.
Is there any actions I should take or anything I should be aware of from a SQL Server, AlwaysON Availability Groups and WSFC perspective?
From the WSFC point of view, assuming that the subnet does not change, there shouldn't be anything needed here.
From SQL Server/AlwaysOn AGs... assuming the subnet does not change:
- Double check the AG endpoint to see if it's bound to that specific ip. If it is, that will need changed. If it isn't (wizard generated, etc.) it'll say 'ALL' and that won't need changed.
- If SQL Server was setup to listen on specific ip's and ports in configuration manager, this will need changed. If it's set for IPALL, nothing needs done.
- If you're using read only routing, double check the read only routing urls and endpoints.
- Service broker endpoints (similar to AG endpoint)
From an overall point of view:
- Make sure the firewall rules have been edited for the new ip if needed.
This may not be an exhaustive list, but should give you an idea and the most critical areas to look at.
Best Answer
That depends on a number of factors. Network speed between the 2 nodes, disk speed on the secondary, volume of data being transmitted. Many of these KPIs are in the AG Overview dashboard (Right-lick the AG in SSMS and select 'show dashboard'). You can select a slew of different metrics including log send queue, log redo queue, estimated recovery time, estimated data loss, etc. As stated in the comments below from scsimon, when in asynchronous, the secondary isn't every truly 'caught up'. It will always show a 'synchronizing..' status and not 'synchronized'.
Yes. Also worth noting that during the time where your secondary replica is unavailable/disconnected, you will not be able to back up any transactions from the log that haven't been sent to the secondary. In in sys.databases the column log_reuse_Wait_desc will be populated with 'AVAILABILITY REPLICA' as transactions cannot be flushed out unless they are committed on the secondary.
The secondary won't just 'not catch up'. If you take the secondary down (without removing it from the AG), log backups won't flush transactions out, log files will fill up and then you'll run out of room, causing any future transactions to fail. The secondary server's amount of time that it can be down is largely dependent on how much space you have and what your transactional volume is. Once the secondary comes back up, it will begin applying transactions that occurred while it was down. I've seen this take from a few minutes to several hours.
For brief outages/patching, leaving the secondary connected is fine, but if you're looking at a long duration outage, it may be easier to just remove the secondary replica and not worry about the hassle of log files filling up. Then you just re-initialize the DBs into the AG once the outage completes.