Sql-server – Error in WSFC Role while applying SQL 2016 CU7 in Multiple SQL instance environment

availability-groupsclusteringsql-server-2016

Kindly advise on following error:

Cluster resource 'AG1_NAME' of type 'SQL Server Availability Group' in clustered role 'AG1_NAME' failed

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

Environment:

AG1: (Primary = SRV01\DEV1, Secondary= SRV02\DEV1, SRV03\DEV1)
AG2: (Primary = SRV02\DEV2, Secondary= SRV03\DEV2, SRV01\DEV2)

The error occurred while updating AG2 Replicas with SQL SERVER 2016 SP2 CU7 which was SQL Server 2016 SP2 CU4.

Updating order as follows, and there was no error during the update wizard

  1. Set fail-over to manual on SRV01\DEV2
  2. Updated SRV01\DEV2Noticed WSFC error (above mentioned)
  3. Set fail-over mode to Auto on SRV01\DEV2
  4. Set fail-over mode to manual on SRV03\DEV2
  5. Updated SRV03\DEV2
  6. Set fail-over mode to Auto on SRV03\DEV2
  7. Manually Fail-over from SRV02\DEV2 (Primary) to SRV03\DEV2
  8. Updated SRV02\DEV2
  9. Manually Fail-back to SRV02\DEV2 (Primary) from SRV03\DEV2

Is it normal while updating 2nd instance of SQL Server the 1st instance get interrupted while server involved in Availability Groups, or should we follow any particular method in this case to avoid any error as such.

Fortunately, particular AG1 and all resources of WSFC was working normal when i look back immediately (after the error) into Roles page of WSFC manager. also PowerShell Get-ClusterResource. but i'm concerned about production update and future updates. Any suggestions would be appreciable. Thanks!

Best Answer

Is it normal while updating 2nd instance of SQL Server the 1st instance get interrupted while server involved in Availability Groups, or should we follow any particular method in this case to avoid any error as such.

No, it's not expected. The error, though, says it tried to bring the resource online and failed at least 3 times (by default, unless you changed this value). This leads me to believe that the instance was patched while it was a primary, which would of course fail while the service is offline for patching. If it wasn't a primary then you'll need to look into the SQL Server errorlog and the cluster log to understand what had happened.