Sql-server – SQL Server Cluster Service Online Pending Status

clusteringfailoversql serversql-server-2012

I have 2 Nodes of SQL Cluster running SQL 2012 Ent over Windows 2012 R2 and i don't know suddenly what happened the service is not coming online. Under Failover Cluster Mannager > Roles > SQL > Resource > SQL Server (instance-name) the status us Online Pending and then Failed.

Cluster have only below evetns;

  • The Cluster service failed to bring clustered role 'SQL Server (instance-name)' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.
  • Cluster resource 'SQL Server (instance-name)' of type 'SQL Server' in clustered role 'SQL Server (instance-name)' failed.
    Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

Event Viewwer;

  • Fault bucket , type 0
    Event Name: Failover clustering resource deadlock
    Response: Not available
    Cab Id: 0

Problem signature:
P1: SQL Server (instance-name)
P2: SQL Server
P3: ONLINERESOURCE
P4:
P5:
P6:
P7:
P8:
P9:
P10:
Attached files:
These files may be available here:
C:\ProgramData\Microsoft\Windows\WER\ReportQueue\Critical_SQL Server (instance-name)_217169f34c33c838a6522b285697633e938ebe_00000000_03a6e263
Analysis symbol:
Rechecking for solution: 0
Report Id: 11f7e038-5f26-11e7-80fb-7a2f1397e813
Report Status: 4100
Hashed bucket:

Kindly suggest what could be wrong.

Best Answer

The default restart settings in WSFC are such that a faulting resource is terminated and then retried once before being declared offline.

In this case: the cluster discovers the fault, calls for terminate and then attempts to re-online. Storage Foundation for Windows detects that the Disk Group is not present and returns a status of Pending. This creates a deadlock that lasts until either RHS.exe terminates (through hang protection); or the online pending timeout is reached.

So, recommend to change Pending timeout and deadlock timeout as per below link.

To change deadlock timeout(Values are in milliseconds)

(Get-ClusterResource “Resource Name”).DeadlockTimeout = 300000

To change pending timeout follow below steps:

  1. In the Failover Cluster Management snap-in, if the cluster you want to configure is not displayed, in the console tree, right-click Failover Cluster Management, click Manage a Cluster, and select or specify the cluster you want.
  2. If the console tree is collapsed, expand the tree under the cluster that you want to configure.
  3. Expand Services and Applications.
  4. Click the clustered service or application that you want to configure the pending timeout for.
  5. In the center pane, right-click the resource for the service or application, click Properties, and then click the Policies tab.
  6. Under Pending timeout, specify the length of time, in minutes and seconds, that the resource can take to change states between Online and Offline before the Cluster service puts the resource in the Failed state.

The default timeout value is 3 minutes. Change as per requirement.