Sql-server – SQL Server Availability Group Design Question/RFC

availability-groupssql server

We are in the early stages of planning an upgrade/redesign of our SQL environment and I am seeking advice/feedback.

The scenario:

  • Current SQL Server 2008R2 EE on Windows Server 2008 R2, blade servers with databases on SAN, some mirroring
  • Target SQL Server 2012 R2 EE (should I consider 2014?) on Windows Server 2012R2, blade server with SAN storage

  • SQL Server 1 to be primary Infrastructure database server (the IT databases, vCenter, AV, SCOM etc.)

  • SQL Server 2 to be primary user database server
  • SQL Server 3, secondary server in availability group configuration for Servers 1 & 2
  • SQL Server 4 secondary mirror of AGs in a different data center, 10GBs or more between datacenters
  • SQL servers 1 & 2 should not fail over to each other

Migration process will be a database at a time, the existing environment will be available until transition is complete

Can this be done? Is there a better way to do it? Is more information needed to determine if there is a better way, if so, what? Where can I find some reliable documented guidance?

Thank you!

Best Answer

Yes, other than the 2012R2 bit what you are describing can be done, but to correct some of the terminology I think you are describing the following:

  • A 4 server Window Geo cluster with 3 servers in Data Centre 1 (DC1) and one server in Data Centre 2 (DC2).
  • Instance of SQL 2012 lets call it INST01 on server SQL1 in DC1. This instance will be used for your infrastructure dbs.
  • Instance of SQL 2012 lets call it INST02 on server SQL2 in DC1. This instance will be used for your user dbs.
  • Instance of SQL 2012 lets call it INST03 on server SQL3 in DC1. This instance will be used to run your local AOG replicas in normal running.
  • Instance of SQL 2012 lets call it INST04 on server SQL4 in DC2. This instance will be used to run your DR AOG replicas in normal running.

From an always on perspective you will have:

  • AOG group called ITDBS. This will exist across INST01, INST03 AND INST04. Always On Group ITDBS on INST01 would be the primary holding your infrastructure dbs and INST03 would be synchronous read only replicas (handy for reporting/scaling out) and INST04 would be asynch replicas for your DR scenario.
  • AOG group called USERDBS. This will exist across INST02, INST03 AND INST04. Always On Group USERDBS on INST02 would be the primary holding your infrastructure dbs and INST03 would be synchronous read only replicas (handy for reporting/scaling out) and INST04 would be asynch replicas for your DR scenario.

The above Always On Group configuration assumes 1 AOG for each db type on each server, you will have to consider if you need to subdivide dbs if so just create more Always on groups. An AOG is basically just a way of grouping databases so they fail over as a logical consistent unit (unlike mirroring where there is no consistency across dbs in terms of failing over).

The above scenario gives you high availability with automatic failover if you wish due to synchronous replicas in DC1. It also gives you DR capability albeit with a manual failover to your SQL4 server in DC2.

If SQL1 and SQL2 are big enough you could use each as the others synchronous replicas thus avoiding the need for SQL3 (ie SQL1 sync reps would live on SQL2 and vice versa) but SQL1 and SQL2 would need to be big enough hardware to cater for the loads in the event of failure. This may save you some money. You would need to work out the costs of having SQL3 (hardware and SQL licenses) against the extra CPUs and memory you would need in SQL1 and SQL2 to run the extra load during failure.

From a licensing perspective and storage perspective quite expensive and all 4 servers need full Enterprise SQL licenses and all non shared storage.

One other possible way to save a bit on storage and everything is have a 3 server windows cluster with 2 in DC1 and 1 in DC2. Have SQL1 and SQL2 as a SQL failover active/passive cluster. This would mean in SQL2014 as long as you have Software Assurance for the license you get 28 days free failover to the passive node. This would save a lot on licensing but also storage as the 2 server SQL cluster would be using shared storage. The down side of this option is that failover is not instant as it is in your solution, there is a 20 second delay or so for the cluster service to failover (depends on disks etc). Your Always On Group would then include the FCI and the SQL3 in DC2. Also another downside of having an FCI in an AOG is you lose automatic failover, but the dbs in DC2 are asynch replicas so you wouldnt have feature anyway.

Also think about your quorum design, 3 would be a bit trickier you would need a vote from a witness disk and to use the DR server you would need to force quorum if both DC1 servers were down but once documented it is easy for your ops/dbas to follow.

Lots of info to consider, hope this helps.