Sql-server – Disk errors in one of the SQL cluster nodes and SQL patching was failing with a lot of errors related to disk and permission issues

sql server

We have two cluster nodes in a SQL cluster and the application use this cluster is Microsoft Identity Manager.

Since few month both nodes ( Node 1 and Node2) are throwing disk error and SQL team is tried fixing that multiple time by running CHKDSK. Recently noticed that, resources are unable to move to primary node(Node1) and they did patching that also failed.But somehow they are able to move the resources from Node 2 to primary Node1.. But SQL team is suggesting to rebuild the server( Node1) as the server still has Disk error and patch failures.

So my question here would be, wont that disk replacement work here in this case and copy all the data from old to new disk ?

Best Answer

There is possibility of disk replacement, for detailed steps..

Seems your team decision is correct as you have been tried to apply patches that been ran through failures.

Edit (Including disk replacement steps summary from same source - referral link):

  1. Take backups of all the databases including System DBs. [ONLY for disaster recovery]
  2. Present all the drives from the new storage to the Windows Cluster. These disks must be visible under “Available Storage” Failover Cluster Manager
  3. Now add these disks to the SQL Cluster Role/Group.
  4. From the SQL Server dependencies, remove the drive letters pointing to the old storage
  5. Take SQL Server Resource to an OFFLINE State
  6. Make sure that the old disks should be online. Now start copying all the folders as is from old storage to the new storage. DO NOT change any files or folder structures.
  7. Once all the files are copied, you need to swap the drive letters. Such that the new drives will get the old drive letters and vice-versa [This can be directly from Failover Cluster Manager] - This might not applicable in your case
  8. Add the disks to the SQL Server dependencies
  9. Make sure the SQL server service account has appropriate permissions on new disk/folder especially for the folders that are pointed in SQL Server - startup parameters

  10. Bring the SQL Resource ONLINE.

  11. Test connectivity from applications.
  12. Once testing is completed thoroughly, disks from old storage can be removed from the Failover Cluster Manager