Sql-server – SQL Server Cluster and Windows Server Storage Replication

clusteringsql serversql-server-2019windows-serverwindows-server-2019

I am currently investigating on how to setup a SQL Server stretch cluster in combination with storage replication for which I want to use Windows Server Storage Replication. However, I am no sure on how to configure the storage replication exactly and maybe you can help. 🙂

I do not want to use Always On Availability Groups. In the end I want to use DTC and that does not seem to be supported in combination with Always On or SQL Server mirroring. The purpose of my setup is to have SQL Server highly available on site A and have a DR copy on site B, which also needs to be highly available if that is made active in case of a fail over. I configured a stretch cluster and that works fine, but I am struggling on how to configure the storage replication in combination with the CSVs.

I cannot find any documentation on the complete setup. I can find docs (MS and community) on how to configure Storage Replication in combination with Scale Out File Server or Hyper-V, but that does not talk on how configure the CSVs exactly and how to fail over to the other site. Every document says regarding the storage something like 'You need manual intervention to fail over'. I have read some documentation from Dell and HPE on how they use their storage replication solution, but that does not give me those details. How and what exactly you need to do on the Windows cluster side is not described. Secondly, they use their own mechanism to make sure the storage blocks are synced in the correct order.

I have 2 scenario's I tried, but these fail. I use my home lab here, so all servers are VMs and therefore use VHDs themselves. So the physical disk in scenario A, for example, is a VHD of a file server VM. All servers use Windows Server 2019.

Scenario A:

  • Configure Storage Replication between a physical disk on iSCSI target
    server A and B
  • Create 2 iSCSI disks on that physical disk and make
    them CSV in the cluster

In this scenario the physical disk will replicate the blocks of the iSCSI disks to the other site. This works fine, but there are several issues if I understand correctly:

  1. The iSCSI disks on Site B cannot be added to the cluster, because they are read-only (not even visible) due to the Storage Replication configured.
  2. In order to fail over, I cannot stop the cluster, switch the replication direction, add the iSCSI disks on Site B as CSV to the cluster. It seems that I need to remove the LUNS first and delete the iSCSI CSV disks from Site A from the cluster? Then add them again at Site B? This seems very cumbersome to me, but maybe this is how this should be done in order to fail over? Or am I missing something here?

Scenario A

Scenario B:

  • Configure Storage Replication between CSVs

In this scenario I create some iSCSI disks and add them as CSV to the cluster. That works fine. Then I wanted to configure replication via the Failover Cluster Manager. First add them as CSV, then right click on one and select 'Replication'.

This scenario fails, because it tells me that the disks are not of the same size. However, they are exactly the same size (I use the same Powershell command to create them) and are formatted with the same NTFS Allocation Unit size. Maybe I am missing something here?

enter image description here

So basically my question is how I can get Windows Server Storage Replication to work correctly in combination with the cluster configuration I am using?

Best Answer

I finally managed to figure it out. I think information is missing, hard to find and no described in much detail. So for me I was a bit in the dark when figuring this out. Also, I think I had some tunnel vision at the start thinking that storage replica works in a certain way. MS has a different way of configuring it and the way it works/ you are supposed to use it.

Scenario A

Possibly an option, but is cumbersome and I don't think MS will support this. I also have not tested this. In order to fail over you probably need to:

  1. Stop the cluster
  2. Switch the replication direction
  3. Add the iSCSI disks to the iSCSI target server at site B as an available iSCSI LUN
  4. Add the LUNs to the SQL Servers again
  5. Add the LUNs to the cluster as CSV, so they will be new as far as the cluster is concerned
  6. Potentially detach and re-attach the databases, because the path could have changed if you have not renamed your volume folder names in C:\ClusterStorage\ when intially setting up the cluster

If someone could comment on this who has more experience on this, that would be great.

Scenario B:

This is the way this is intended to be used. In order to correctly setup the configuration you need to:

  1. Create a multi-site cluster and make sure Windows Server Datacenter version is used for all nodes. Else it will not support replication.

  2. Configure the storage. What is not in the mentioned documentation is that (per LUN you want to replicate) you need to have the data source disk configured as CSV and the other disks configured as Available Storage. Move all the available storage to the node that is the owner of the source disk on Site A (so that will include the available storage in the other site, Site B. This will bring some disks offline, no worries the Storage Replica Wizard in Failover Cluster Manager you will use later on uses logic that will detect which node has access to what storage

  3. Configure fault domains, which is mentioned in the doc also mentioned by stephenmorris

I think that the fault domains in relation to Storage Replica, are used to determine which nodes are at what site and therefore what storage is accessible at what site. This is because you see some storage tests going on in the background during the Storage Replication wizard. However, I am not sure of this.

New-ClusterFaultDomain -Name SiteA -Type Site -Description "Primary" -Location "Site A"
New-ClusterFaultDomain -Name SiteB -Type Site -Description "Secondary" -Location "Site B"

Set-ClusterFaultDomain -Name Node1 -Parent SiteA
Set-ClusterFaultDomain -Name Node2 -Parent SiteA
Set-ClusterFaultDomain -Name Node3 -Parent SiteB
Set-ClusterFaultDomain -Name Node4 -Parent SiteB

(Get-Cluster).PreferredSite="Seattle"

What they probably forgot in this code is to turn the feature on, which you do by using:

#Enable fault domain awareness
(Get-Cluster).AutoAssignNodeSite=1

Reading the document again at a later time, I am not really sure if this is a requirement, since the only role for the Fault Domains in relation to Storage Replica seems to be to determine what storage is located at what site. It is not mandatory to have this awareness during normal operation? You would need to enable the auto behaviour if you have nodes and disks spread across racks on a single site for example and you want to use Storage Spaces Direct for your storage fault tolerance? Maybe someone could comment on that as well.

  1. Configure Storage Replica

Now what is also not described in the document is that you need to do this via Failover Cluster Manager. Right click on the source disk -> 'Enable Replication'. In the documentation they use Powershell. This works as well, but you need to use the name of the cluster as the source AND destination computer. Not the nodes themselves. Also, for SQL Server you need to add the storage volume to a Consistency Group. You will be given this question in the GUI. The logic used by the wizard will automatically detect what volumes are accessible by which nodes. You will see this happening in the background.

Somewhere in the aforementioned document is a link, which is not easy to find, to a video that has more details: https://channel9.msdn.com/Events/Ignite/2015/BRK3487

Something regarding the consistency groups is not clear to me by the way. I don't see any way how I can add 2 source volumes to 1 consistency group. Or am I just no understanding the concept correctly?