Sql-server – AlwaysON Availability Groups – IP address & subnet change of secondary replica node

availability-groupsclusteringsql server

This is a follow up from question:
AlwaysON Availability Groups – IP address change of secondary replica node

Same scenario, Availability Group in synchronous commit mode, one primary and one secondary replica in a multi-subnet configuration. The cluster nodes are physical machines. Due to hardware maintenance, the server used by the secondary replica cluster node is being moved, so will be offline and come back online with a different IP address in a different subnet. How would one approach this?

My initial thoughts are: If possible, add the new network interface to the secondary node. Configure the new network interface with the new IP address which is in a different subnet. The cluster should automatically build the internal routes and register the new network interface.

In FCM (Failover Cluster Manager) a new cluster network will appear under the 'Networks' tab. In cluster core resources, add the new network and create a new static IP address for the cluster VNN. Apply the changes then go back in and add an OR dependency on the new IP address.

Prior to the server move outage where the old IP address/network interface will be removed, in FCM, go back to cluster core resources, go into the properties of the cluster VNN and remove the old network/static IP address and remove any dependencies on it. When the server is taken offline and comes back online, the old network interface won't be visible to the server and won't appear in the networks tab in FCM. There shouldn't be any issues with any clustered resources as any dependencies on the old IP/subnet/network were removed.

Is there anything else that should be taken into consideration? Since the cluster nodes are physical, does this complicate matters with regards to the network interfaces? With testing in a virtual environment it's obviously simplified as removing and attaching virtual network switches is easy.

Best Answer

How would one approach this?

With a whole bunch of pre-move work :)

My initial thoughts are: /snipped for brevity

You have the right ideas, let me add a few more.

Pre-Work

  • The cluster is now Multisubnet, we'll need to add an IP for the cluster name in the second subnet. This will be added to the cluster name resource as an 'OR' dependency as you've stated above. This can be added at any time (I'd add it pre-move). The cluster CAP will now have 2 IP addresses, one for each subnet.
  • Firewall rules, etc.
  • The listener (as you've already stated) will become multi-subnet. Another IP address will need to be added in the 'OR' dependency.
  • Double check the setting of the client access point (CAP [aka VNN, aka network name]) for: HostRecordTTL and RegisterAllProvidersIP to make sure they are setup the way you want them. Note that you may want to use multiple listeners to facilitate clients that use older connection libraries not supporting the new keywords.
  • Decide if you want to remove the replica from the AG (if it's going to take a while to physically move the server) or to just pause the AG. If the AG is removed, the databases will go back into a "restoring..." state and you'll be able to catch them up with restores at a later point after the server arrives and is setup.
  • Quorum: There isn't much to say here, it's a 2 node cluster with 1 node getting airline miles. Included because we'll hit this later and it's a common question.

Post-Work

  • The server should (with a new IP and proper firewall rules) be able to contact the cluster and join.
  • Run cluster validation wizard. Save this report as the initial move report. We may need to look at it later. This will also verify most cluster related items and runs very quickly.
  • Verify network latency. Set CrossSubnetDelay and CrossSubnetThreshold appropriately for the latency and health of the connection. No changes may be needed but it's good to double and triple check.
  • Restore transaction logs/diffs/etc.
  • Add the replica back into the AG
  • Revisit Quorum. Even though there are just two nodes in the cluster, we'll want to double check we're not using a disk witness. Depending on the version of Windows we should/could use either a fileshare witness or Azure witness.

There may be additional items specific to your environment but that should be the gist of it. You've pretty much hit the nail on the head in your original question/post, this just adds in a little filler :)