SQL Server – Adding a Geographically Dispersed Third Node to an AAG Cluster

availability-groupsdisaster recoveryhigh-availabilitysql serversql-server-2012

We have a 2 node Always setup in our data center at DC and I have been tasked to setup a third node in this cluster in San Jose. What are the considerations I need to take to make this happen? I read the below given document which provides some insights. Is this document sufficient to consider as a reference?

http://download.microsoft.com/download/D/2/0/D20E1C5F-72EA-4505-9F26-FEF9550EFD44/SQLServer2012_MultisiteFailoverCluster%20(2).docx

My major questions would be where do I place the quorum (probably a disk and file share witness) . And if Asynchronous commit is the only option available because of the geographical distance is it viable to just a have a manual failover option?

Best Answer

My major questions would be where do I place the quorum (probably a disk and file share witness)

Quorum is a majority agreement, it is not a resource. It used to be a resource until 2003 where the only option was a disk, but this hasn't been the case since Server 2008.

There can only be a single witness, you can't have both a disk and a fileshare. The choice of which to have is going to be based on your favorable scenario during a failure and your internal implementation limits. For example, a company I work with has said absolutely no shared disk to any server, so that rules out a disk as a potential witness.

Most organizations want the whatever is considered the optimal side for their business (though, proper planning and implementation there wouldn't be an "optimal" side) to stay running. In most cases we call this the "primary" side. Generally speaking the witness should be closest to the location that should stay up in case of a true split situation when using windows server 2012R2 or lower. If you're using Windows Server 2016 you will want to look into sites.

And if Asynchronous commit is the only option available because of the geographical distance is it viable to just a have a manual failover option?

Again, depends. Will this ever be failed to? Maybe just in the instance of a true regional DR event where RTO trumps RPO? Will there be any reporting on this new replica? If so, does the reporting require a specific data freshness SLA?

Synchronous will obviously have an overhead and your round trip time (rtt) latency will play the largest factor for long distance AGs, followed next by disk. Asynchronous will obviously not have those problems in terms of reducing the throughput of the workload but will still feel it in terms of data freshness if it is used for reporting.

Remember, if it is a planned downtime you can change the availability mode before performing any activities.