Sql-server – Which is the best approach to “suspend” the activity of a whole AOG cluster thinking on start it again later

availability-groupsclusteringmaintenancesql server

My company is going to migrate some services from one on-prem site to another but the database servers will remain in the same location (for now).

I've been asked for "suspend" all the activity of the three AOG cluster nodes (all secondary instances are readable without routing), during a long maintenance window.

Two main options are coming to my mind:

Maybe not the fanciest. Suspend data movement and then stop SQL Engine service at the secondary replicas first and then stop the primary at the end. When the time to start services comes, start the primary first and the secondaries later in inverse order and resume data movement at the end.
Disable jobs and intended logins at all instances to enable them again later keeping all AOG activity running but without user access.

I know the "best option" is completely relative to the context, but I would like to read some opinions.

Thank you,

Best Answer

You don't need to suspend data movement, but otherwise what you've written for option #1 is a good approach! Before doing anything, I would disable automatic failovers on all replicas, to prevent any unexpected failovers when you bring the AG back up.

Then, as you said, shut down the secondaries first, then shut down the primary.

When you bring the primary back up, it should come online as the primary again. Then bring up the secondaries.

Make sure to turn automatic failovers back on once everything is stable, if you had it on in the first place.

Related Solutions

Sql-server – ApplicationIntent=ReadOnly Traffic when no Readable Secondary Available

There are several steps to configuring a server to accept ReadOnly traffic. The following link walks you through it, http://msdn.microsoft.com/en-us/library/hh710054.aspx ,but basically you need to configure each server in the AG and then set up the routing for each.

Here's the T-SQL involved:

ALTER AVAILABILITY GROUP [AG1]
 MODIFY REPLICA ON
N'COMPUTER01' WITH 
(SECONDARY_ROLE (ALLOW_CONNECTIONS = READ_ONLY));
ALTER AVAILABILITY GROUP [AG1]
 MODIFY REPLICA ON
N'COMPUTER01' WITH 
(SECONDARY_ROLE (READ_ONLY_ROUTING_URL = N'TCP://COMPUTER01.contoso.com:1433'));

ALTER AVAILABILITY GROUP [AG1]
 MODIFY REPLICA ON
N'COMPUTER02' WITH 
(SECONDARY_ROLE (ALLOW_CONNECTIONS = READ_ONLY));
ALTER AVAILABILITY GROUP [AG1]
 MODIFY REPLICA ON
N'COMPUTER02' WITH 
(SECONDARY_ROLE (READ_ONLY_ROUTING_URL = N'TCP://COMPUTER02.contoso.com:1433'));

ALTER AVAILABILITY GROUP [AG1] 
MODIFY REPLICA ON
N'COMPUTER01' WITH 
(PRIMARY_ROLE (READ_ONLY_ROUTING_LIST=('COMPUTER02','COMPUTER01')));

ALTER AVAILABILITY GROUP [AG1] 
MODIFY REPLICA ON
N'COMPUTER02' WITH 
(PRIMARY_ROLE (READ_ONLY_ROUTING_LIST=('COMPUTER01','COMPUTER02')));
GO

Sounds like you may be missing the configuration and/or routing information for the primary.

SQL Server Always On – Listener Behavior and DB Removal

Our plan is to remove the Databases from the AGs on the primary instances, but leaving the listener in place so the applications should still connect to the DBs via the VIP/DNS.

We cannot suspend movement as the outage could be long and MS recommend movement is suspended for a short period only.

I would not remove the databases on the primary node from the availability group.

I would, however, remove the affected secondary replicas from the availability group. Removing them would accomplish two things:

Since these are secondary replicas the databases will return to a restoring state. This will be helpful in the future.
Allows the primary and any unaffected secondary replicas to stay in the availability group and continue with log backups and re-use.

During the time the secondary replicas are removed from the AG, continue to take log backups as normal. This will facilitate the reuse of the log so that it doesn't grow out of control. Keep these log backups handy and ready for action.

Once the affected secondary replicas are no longer affected, copy all of the log backups taken while the secondary replicas were out of the AG and apply the log backups to those databases. When applying the log backups make sure to keep the databases in a restoring state by choosing WITH NORECOVERY on each log restore.

Finally, suspend the log backups and restore any final ones that were taken while restoring the older ones. This will bring the databases on the previously removed secondary replicas to the same time frame as the primary and any other secondary replicas.

Once the final log backup has been applied and the databases still left in a restoring state, add the replicas back into the AG. When this happens, since the databases are still in a restoring state and have been restored to the last log backup the AG will be able to join the replicas and databases without issue. There will be a short period of time where the replicas will need to catch up.

Once the secondary replicas and databases are rejoined, resume log backups as normal.

This would be the ideal process as it keeps your AG intact (for any unaffected secondary replicas), continues to leverage the listener for your applications, can still provide HA and to some extent DR depending upon the replicas available, continues to allow for log backups and re-use, stays transparent to the end user.

Best Answer

Related Solutions

Sql-server – ApplicationIntent=ReadOnly Traffic when no Readable Secondary Available

SQL Server Always On – Listener Behavior and DB Removal

Related Question