I'm testing a PowerShell Disaster Recovery script and trying to fail over to an off-site AG DR (asynch) replica with this command
ALTER AVAILABILITY GROUP [MyAG] FORCE_FAILOVER_ALLOW_DATA_LOSS
On several clusters this works fine but on a couple of other ones it usually fails (but not always) with the error:
Failed to move a Windows Server Failover Clustering (WSFC) group to
the local node (Error code 5023). The WSFC service may not be running
or may not be accessible in its current state, or the specified
cluster group or node handle is invalid. For information about this
error code, see "System Error Codes" in the Windows Development
documentation.Failed to designate the local availability replica of
availability group 'MyAG' as the primary replica. The operation
encountered SQL Server error 41018 and has been terminated. Check the
preceding error and the SQL Server error log for more details about
the error and corrective actions.
We have to force the quorum simulating the situation where the async DR replica can't communicate with the other replicas: https://docs.microsoft.com/en-us/sql/sql-server/failover-clusters/windows/force-a-wsfc-cluster-to-start-without-a-quorum?view=sql-server-2017
A comparison of the clusters that work vs. those that don't with Get-Cluster
and Get-ClusterGroup
doesn't reveal any major differences.
This failover has worked successfully on some clusters but fails 99% of the time on two of the clusters. It has worked on these two clusters, but rarely. One of the common issues I found from a Google search is permissions for NT Authority\System
(see https://dataginger.com/2014/10/28/sql-server-failed-to-bring-availability-group-availability-group-name-online/), which I confirmed it has.
I ran a trace and see the same errors and no other clues in the trace.
If nothing turns up I may have to open a Microsoft ticket to engage their support.
Any thoughts?
Thanks for your help!
Best Answer
This error may happen if your SQL Server service is running under user who does not have the needed permission on your servers or the account is locked. try to use another account and make sure account is unlocked.
If that did not fix the issue then you need to edit your regisery (make sure you take a backup first)
regedit
then press okHKEY_LOCAL_MACHINE\Cluster\Resources\<GUID>\Parameters
Create the following registry values in the
Parameters registry key:
For a default instance of SQL Server: InstanceName
Value Name: InstanceName
Value Type: REG_SZ
Value Data: MSSQLSERVER
VirtualServerName
Value Name: VirtualServerName
Value Type: REG_SZ
Value Data:
For a named instance of SQL Server:
Value Name: InstanceName
Value Type: REG_SZ
Value Data:
VirtualServerName
Value Name: VirtualServerName
Value Type: REG_SZ
Value Data:
4- quit registery editor