SQL Server – Running Two WSFC Instances on Different Servers

availability-groupsclusteringsql serversql-server-2017

I'm drawing up a proposal for upcoming infrastructure changes. This will include a production server and reports/data warehouse server, each with Always On. To keep hardware and licencing costs down, is it possible to run in a configuration of Server-A running Prod-AG Primary and Rep-AG Secondary, and Server-B running Rep-AG Primary and Prod-AG Secondary?

I presume each server would need 2x of the following WSFC instances,sql instances, AG's, listeners, DNS names/ports.

I hope this makes sense, here's a diagram of what I think it will looks like.

In the case of a fail-over on either node, the workload/business need isn't that great that running off the same server for a couple of hours would be a major issue.

I've only found a couple of mentions of a similar setup that kind of worked but no definitive information from Microsoft or anyone who's successfully ran this setup.

SQL Edition will be 2017, most likely standard, I don't think we'll be approved for Enterprise. OS will be Windows Server 2016 Core.

Best Answer

Yes, you can have more than one availability group on a cluster. Each availability group is completely independent of the others, and can be failed over to any node in the cluster separately. From Configure SQL Server 2012 AlwaysOn Availability Groups for SharePoint 2013:

Because SQL Server 2012 can host multiple availability groups on a single server, you can configure AlwaysOn to fail over to SQL Server instances on different servers. This reduces the need to have idle high performance standby servers to handle the full load of the primary server, which is one of the many benefits of using availability groups.

The is of course relevant regardless of whether SharePoint is involved.

You will not need additional instances--just one default instance of SQL Server on each node. And I think you may be confused about WSFC instances. You simply set up the cluster, install SQL Server on each node, and then create the availability groups. When you create the AGs, the setup process will create all of the cluster roles and resources that are required.

Search YouTube and watch one being set up and I think that will make things clearer (https://www.youtube.com/watch?v=VKCqRgqLAuo). The documentation gets murky because it continually refers to WSFC instances and availability groups, but if you're using straight AGs, you aren't going to be using WSFC instances. It makes it appear far more complex than it is.

Related Solutions

SQL Server 2012 – Fix AlwaysOn Availability Group Automatic Failover

If I disconnect DEV-AWEB5

Define "disconnect", if you will. My guess is you kept the box up but took SQL Server down.

I cannot connect to the Group Listener (DevListener), but I can ping it and it will respond to my ping

That's because the listener is just a virtual network name (VNN) within the WSFC cluster resource group for the represented availability group. Your DEV_AWEB5 node still owns the cluster resource group, but it's just the AG cluster resource most likely that is in a failed state. The VNN must still be online (expected behavior). It's simply pointing to whatever node is owning that resource group (in this case, DEV-AWEB5). In fact, if you had PowerShell remoting enabled, and you ran the following:

Invoke-Command -ComputerName "YourListenerName" -ScriptBlock { $env:computername }

Likewise, if you can RDP into DEV-AWEB5 (provided you have the capability and accessibility, etc.) then you'd be able to RDP using the listener name (mstsc /v:YourListenerName). It's just a VNN.

The return of that would be the computer name of your owning node.

By all of your symptoms, I'd be willing to bet that you've reached your failover threshold. The failover threshold determines how many times the cluster will attempt to failover your resource group in a specified time period. The default of these values max failovers n - 1 (where n is the count of nodes) in a period of 6 hours. You can see that through the following WSFC PowerShell command:

Get-ClusterGroup -Name "YourAgName" |
    Select-Object Name, FailoverThreshold, FailoverPeriod

That just gives you the settings (which you can modify if you so choose, of course).

The best way to prove that this is the case for you, you would need to generate the cluster log (the system event logs only go into detail as far as " has failed", or something like that).

Get-ClusterLog -Node "YourClusterNode" -TimeSpan <amount_of_minutes_since_failure>

That'll by default get put into the "C:\Windows\Cluster\Reports" folder, and the file is called "Cluster.log".

If you were to open up that cluster log, you should be able to find the following string in there, indicating exactly what happened and why it happened:

Not failing over group [YourClusterGroupName], failoverCount [# of failovers], failover threshold [failover threshold value], nodeAvailCount [node available count].

The above message is simply WSFC telling you that it will not failover your group because it's happened too much (you hit the threshold).

Why does this happen? Simply to prevent the Ping-Pong effect of cluster resources going back and forth too frequently between nodes.

Whereas this would be common to hit these thresholds in failover testing, in production it would typically point to a problem that should be investigated.

Scheduling Backup Jobs for an Availability Group in SQL Server 2014

In a MS SQL Server 2014 AlwaysOn AG setup, I would like to schedule backup jobs for a given availability group. The ultimate goal is to have the regular backups run on the synchronized secondary and, most importantly, not bound to the availability of a particular secondary node.

This can be accomplished without issue. Just keep in mind that your systems should be spec'd to be able to take the full load of the synchronization mechanics for AlwaysOn Availability Groups, plus the redo needed, plus the backups. The last thing that you want is to overwhelm your IO system and cause outages or a large redo queue. This is assuming it's not actively being used as a readable secondary - if it is, you'll want to factor that in and watch for blocked redo threads.

1.I want to run the scheduled action on a secondary just once, but I have multiple secondaries 2.I want to ensure that it is ultimately run, regardless if there are any secondaries left at all - if no secondaries are left, it should run on the primary

There is an Availability Group setting and a system function that can be used in order to accomplish this within your expected guidelines.

The first is the setting, per AG, called AUTOMATED_BACKUP_PREFERENCE which has four different options. The one you're describing is called SECONDARY which will prefer any secondary node over the primary node. The secondary node chosen is by the BACKUP_PRIORITY which is set per replica ranging from 0-100, where the higher the number the more weight it has. If there is a tie in weights for secondary replicas, the replica that is sorted first given the system collation will be chosen. If all secondary replicas should fail, the primary replica will be chosen.

The second part of the equation is the system function used to check if the replica that is running the job is the preferred replica based on the values in the previous paragraph. This system function is called sys.fn_hadr_backup_is_preferred_replica(). Given a database name (any database in the AG) a value of 0 will be returned if it is not the preferred replica and a value of 1 if it is preferred.

When creating the agent job to accomplish this, you'll want to wrap the backup logic in an IF conditional to check for the preferred replica. That's it. Put the identical agent job on all replicas.

Please be sure to test that this truly is what you want and expect.

Best Answer

Related Solutions

SQL Server 2012 – Fix AlwaysOn Availability Group Automatic Failover

Scheduling Backup Jobs for an Availability Group in SQL Server 2014

Related Question