Sql-server – SQL Server 2012 failover cluster node won’t start cluster services

failoverhigh-availabilitysql serversql-server-2012windows-server

We have 2 node SQL server failover cluster setup in our environment.

Node2 SQL server machine got shut down.

I turned it back on and connect back to the failover cluster.

It successfuly connected however, node1 cluster service wont start even if I start it.

Under the Network connections on the failover cluster the SQL ethernet is unavailable. I try to do IPconfig on node1 and I can only see the machine's IP address, in node2 I can see the Windows cluster IP and SQL cluster IP. I ran cluster validation report and this is the one that stood out:

Node SQL-01.ourserver.com is not reachable from node SQL-02.ourserver.com. It is necessary that each cluster node can
communicate each other cluster node by a minimum of one network path (though multiple paths
are recommended to avoid a single point of failure). Please verify that existing networks
are configured properly or add additional networks.

node2 also got that same error saying node2 is not reachable from node1.

What can I do to start cluster service on node1 back on?

I'm not sure if this information would be helpful, but after this windows and SQL cluster got created, we changed the name of the machinesenter image description here

Best Answer

I have had not exactly the same but similar issues described here, here and here.

What can I do to start cluster service on node1 back on?

when a node feels alone in a windows server cluster setting, it loses the quorum.

Then, to start the windows cluster failover service without the quorum, you need to force it to start.

it is very important, because generally you would have something like this:

windows cluster name - the name that all the applications try to connect to: le't's say it is called the_server but actually the_server does not exist, what exist is node1 and node2 (plus maybe a quorum disk or a shared storage for quorum purposes) - so even if you have only one node left, you need the windows cluster failover service running, so that all applications can find the_server

The way I would do that is using powershell:

Import-Module FailoverClusters  

$node = "Always OnSrv02"  
Stop-ClusterNode -Name $node  
Start-ClusterNode -Name $node -FixQuorum  

(Get-ClusterNode $node).NodeWeight = 1  

$nodes = Get-ClusterNode -Cluster $node  
$nodes | Format-Table -property NodeName, State, NodeWeight

this is described in detail here:

Force a windows server failover cluster to start without a quorum

To force a cluster to start without a quorum:

  1. Start an elevated Windows PowerShell via Run as Administrator.
  2. Import the FailoverClusters module to enable cluster commandlets.
  3. Use Stop-ClusterNode to make sure that the cluster service is stopped.
  4. Use Start-ClusterNode with -FixQuorum to force the cluster service to start.
  5. Use Get-ClusterNode with -Propery NodeWieght = 1 to set the value the guarantees that the node is a voting member of the quorum.
  6. Output the cluster node properties in a readable format.

First, get hold of the failover cluster manager application, let's have a look at what we have got:

enter image description here

This is how it looks like:

enter image description here

Check the servers inside the nodes: enter image description here

This is the normal way to manage the clustering services.

However, when there is no quorum we need to force the service to start. For that you need to follow this link.

just don't forget the way to run the powershell:

enter image description here

enter image description here

The way to run this script is command by command First this:

Import-Module FailoverClusters  

Then

$node = "SQLPROD2"  
#Stop-ClusterNode -Name $node  
Start-ClusterNode -Name $node -FixQuorum  

(Get-ClusterNode $node).NodeWeight = 1  

$nodes = Get-ClusterNode -Cluster $node  
$nodes | Format-Table -property NodeName, State, NodeWeight   

and all good - with only one node up and running - just until we add a new node (but no downtime and no application is upset and not connecting to the_server):

enter image description here