Sql-server – Alwayson Manual Failover failed

availability-groupssql serversql-server-2016

We have SQL Server 2016. We tried to manually fail-over our availability group (has 2 replicas no listener), and an error message was given while failing-over.

Windows cluster service is up and running with no errors in cluster services. I tried adding NT AUTHORITY\SYSTEM as sysadmin on the SQL server nodes.

The problem still persists.

Error message :

Manual Failover failed (Microsoft.SqlServer.Management.HadrTasks)

------------------------------
Program Location:

   at Microsoft.SqlServer.Management.Hadr.FailoverTaskWorkItem.DoWork()
   at Microsoft.SqlServer.Management.TaskForms.SimpleWorkItem.Run()

===================================

Failed to perform a manual failover of the availability group 'XX' to server instance 'XXX'. (Microsoft.SqlServer.Management.HadrModel)

------------------------------
For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft+SQL+Server&ProdVer=13.0.16000.28+((SSMS_Rel).161022-0456)&EvtSrc=Microsoft.SqlServer.Management.Smo.ExceptionTemplates.FailedOperationExceptionText&LinkId=20476

------------------------------
Program Location:

   at Microsoft.SqlServer.Management.HadrModel.Task.Perform(IExecutionPolicy policy, CancellationToken token, ScenarioTaskHandler taskDelegate)
   at Microsoft.SqlServer.Management.Hadr.FailoverTaskWorkItem.DoWork()

===================================

An exception occurred while executing a Transact-SQL statement or batch. (Microsoft.SqlServer.ConnectionInfo)

------------------------------
Program Location:

   at Microsoft.SqlServer.Management.Common.ServerConnection.ExecuteNonQuery(String sqlCommand, ExecutionTypes executionType, Boolean retry)
   at Microsoft.SqlServer.Management.Smo.ExecutionManager.ExecuteNonQuery(String cmd, Boolean retry)
   at Microsoft.SqlServer.Management.Smo.SqlSmoObject.DoCustomAction(String script, String toplevelExceptionMessage)

===================================

Failed to move a Windows Server Failover Clustering (WSFC) group to the local node (Error code 5963).  The WSFC service may not be running or may not be accessible in its current state, or the specified cluster group or node handle is invalid.  For information about this error code, see "System Error Codes" in the Windows Development documentation.
Failed to designate the local availability replica of availability group 'XX' as the primary replica.  The operation encountered SQL Server error 41018 and has been terminated.  Check the preceding error and the SQL Server error log for more details about the error and corrective actions. (.Net SqlClient Data Provider)

------------------------------
For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft%20SQL%20Server&ProdVer=13.00.2164&EvtSrc=MSSQLServer&EvtID=41018&LinkId=20476

------------------------------
Server Name: XX
Error Number: 41018
Severity: 16
State: 0
Line Number: 1


------------------------------
Program Location:

   at Microsoft.SqlServer.Management.Common.ConnectionManager.ExecuteTSql(ExecuteTSqlAction action, Object execObject, DataSet fillDataSet, Boolean catchException)
   at Microsoft.SqlServer.Management.Common.ServerConnection.ExecuteNonQuery(String sqlCommand, ExecutionTypes executionType, Boolean retry)

We have 3 nodes, 2 local and 1 Azure.
I have added SYSTEM as sysadmin to all three nodes. Before failing over I change the availability mode to synchronous and make sure there is no data loss.

Best Answer

We have found the issue it was a Quorum disk resource attached to the AGCLUSTER which was not able to take into to the other nodes.

We removed the disk and issue is gone

Thank you everyone for your support and your valuable time!!