Sql-server – AG with overlapping AG groups

availability-groupshigh-availabilityperformancesql server

We are currently trying to implement a two node AG where node 1 has two AG's. The first AG is primary to node 1 and secondary to node 2, the second AG is primary to node 2 and secondary to node 1. I was told that this is the set up we will be using because we wanted to make use of all resources, for example node 1 will be responsible for 3 databases and node 2 will have 2 databases, I guess this is a pseudo active/active set up. I personally don't know the use case for this set up, but it does seem a little bit over the top. Any pro's, con's or resource help would be helpful.

Best Answer

I have seen setups that run 2 AG's in a 3 node cluster. Usually all the databases are running on the primary server, but they can (and do) failover each AG separately. The setup will work, the question now is, do you want to do it?

Some problems you will have to consider:

  1. Extra admin of running and debugging 2 AG's (and confusion of what runs where)
  2. Extra overhead on SQL of running a 2nd AG (minimal impact for the small number of databases)
  3. Extra licensing costs. SQL isn't cheap, if you have Software Assurance you can run a non-readable secondary for free. If you want to split the load you need to fully license both servers. (Note: I'm not a licensing expert)
  4. The big one: Can both servers run a full load on all of the databases should 1 server fail?

The last point is the important one, and the one that makes this setup less desirable. If you are splitting the 2 AG's across 2 servers because 1 server doesn't have the resources, then what are you going to do when 1 server crashes and the AG fails over? You now have one server running all of the databases. Can that server cope with the load? If so, keep it simple and use it all the time. If not, your AG is not going to do the HA/DR that you'd hoped it would.

One scenario I did see where this setup worked was a database running on the secondary that consumed a lot of CPU due to batch processing. The business decided that should either node fail, they wanted to still be able to query the data (very small load on SQL) but were happy to stop the batch processing of new data. They chose to go with 2 AG's and added a whole lot of code to their batch processing so it could turn itself off when it detected both AG's were being served from the same server. In normal operations this took a heavy CPU load off their primary OLTP server, but at the cost of having to stop a fairly important process should 1 node fail.