I have had not exactly the same but similar issues described here, here and here.
What can I do to start cluster service on node1 back on?
when a node feels alone in a windows server cluster setting, it loses the quorum.
Then, to start the windows cluster failover service without the quorum, you need to force it to start.
it is very important, because generally you would have something like this:
windows cluster name - the name that all the applications try to connect to: le't's say it is called the_server
but actually the_server does not exist, what exist is node1 and node2 (plus maybe a quorum disk or a shared storage for quorum purposes) - so even if you have only one node left, you need the windows cluster failover service running, so that all applications can find the_server
The way I would do that is using powershell:
Import-Module FailoverClusters
$node = "Always OnSrv02"
Stop-ClusterNode -Name $node
Start-ClusterNode -Name $node -FixQuorum
(Get-ClusterNode $node).NodeWeight = 1
$nodes = Get-ClusterNode -Cluster $node
$nodes | Format-Table -property NodeName, State, NodeWeight
this is described in detail here:
Force a windows server failover cluster to start without a quorum
To force a cluster to start without a quorum:
- Start an elevated Windows PowerShell via Run as Administrator.
- Import the FailoverClusters module to enable cluster commandlets.
- Use Stop-ClusterNode to make sure that the cluster service is stopped.
- Use Start-ClusterNode with -FixQuorum to force the cluster service to start.
- Use Get-ClusterNode with -Propery NodeWieght = 1 to set the value the guarantees that the node is a voting member of the quorum.
- Output the cluster node properties in a readable format.
First, get hold of the failover cluster manager application, let's have a look at what we have got:
This is how it looks like:
Check the servers inside the nodes:
This is the normal way to manage the clustering services.
However, when there is no quorum we need to force the service to start.
For that you need to follow this link.
just don't forget the way to run the powershell:
The way to run this script is command by command
First this:
Import-Module FailoverClusters
Then
$node = "SQLPROD2"
#Stop-ClusterNode -Name $node
Start-ClusterNode -Name $node -FixQuorum
(Get-ClusterNode $node).NodeWeight = 1
$nodes = Get-ClusterNode -Cluster $node
$nodes | Format-Table -property NodeName, State, NodeWeight
and all good - with only one node up and running - just until we add a new node (but no downtime and no application is upset and not connecting to the_server):
The simplest answer is, no, you won't be able to facilitate this with a 3 node cluster in the manner described.
The reason is due to quorum. Assuming the 3 nodes, 2 at Primary and 1 at DR with Windows Server 2012R2. Dynamic quorum is on by default, this will automatically adjust node weights in case of a node failure. Dynamic witness is also on by default which will change the witness vote to keep the number of total votes odd.
The thing is that dynamic quorum only works if less than half of the nodes go down simultaneously. If 50% or more voting nodes go down at once there won't be enough voters left to keep quorum or for dynamic quorum to decide that this isn't a split brain scenario.
How could you potentially achieve this?
If it would be possible to put 2 nodes at the Primary site, 2 nodes at the DR site, and a witness at a 3rd site then it should do what you're looking for.
is there any way to configure this in a way which automatic failover will occur if the primary and DR sites lose connectivity or if the primary site goes down entirely?
These look the same from the perspective of the DR site. Whether it loses connection with the servers at the primary site or whether the primary site goes down doesn't look any different. In each case they no longer can "see" the other nodes, only the local ones. This results in the race to acquire the lock on the witness. Whichever side attains the lock first, wins.
There is an additional setting in Windows Server 2012R2 called LowerQuorumPriorityNodeID
which can be used to weight one side or the other when these types of situations happen.
Best Answer
It could be the issue described in INF: AlwaysOn – The secondary database doesn’t come automatically when the primary instance of SQL Server goes down by Arvindh Kalidasan - Support Engineer, Microsoft GTSC.
The workaround posted there is: