Sql-server – Patching FileShareWitness for AlwaysOn AG

availability-groupshigh-availabilitysql serversql-server-2016

Situation:
We have an AlwaysOn High Availability group with 4 replicas (2 for DR) a listener and an FileShareWitness. I've been tasked with creating a set of instructions for patching all of the servers in the environment.

Problem:
The plan has been based on Pinal Daves article (with some modifications). here

The problem is that when patching the FSW it switches back and forth between replicas.

A colleague has suggested removing the possible failover value from the ip (not from AOAG) to avoid the failover and then putting it back afterwards.

Please advise if this is good idea, or if there is a better way of stopping the AG replica from switching back and forth?

SQL is 2016 Enterprise SP1 running on WS2012R2 Datacenter 6.3 with Hypervisor

Any help will be much appreciated.

Best Answer

Normally, rebooting the File Share Witness should be totally fine in this setup as long as both AG replicas are up and healthy when you patch the FSW server. So you shouldn't need to remove anything from the WSFC settings to accomplish your plan.

I had a situation a while back where

  • the file share witness was down for maintenance
  • the PRIMARY and SECONDARY both switched to the RESOLVING state briefly (about 15 seconds)
  • then they both returned to their original roles

This was due to the File Share Witness resource running in the same Resource Hosting Subsystem (RHS) process as the AG node resources - when the FSW failed it's "are you up" checks, it caused the whole RHS process to fail, which resulted in the "blip" in the AG.

You can avoid this problem by setting the FSW resource to run in a separate RHS process in WSFC settings:

FSW resource run in separate monitor

I blogged in detail about that here: Troubleshooting an AG Failure


If there is something else causing this problem in your case, you can still troubleshoot it using the steps outlined in my blog post:

  • look at the SQL Server error log
  • look at the WSFC cluster log
  • review the AlwaysOn_health extended event session

If you don't want to dig through every single one of these places, you can also use the Failover Detection Utility provided by Microsoft Support.