Sql-server – Fileshare witness for 3 node Multisubnet AAG setup

availability-groupsclusteringsql server

I am planning for a 3 node AG, 2 in DC and 1 in San Jose. Host OS will be Windows Server 2012 R2.

We dont have an option of a 3rd site, so my question is where can I place the FSW if I have to do a planned manual failover?

Also, once manual failover is performed, should I be disabling the votes of the nodes in the primary site?

Please advise.

Best Answer

I am planning for a 3 node AG, 2 in DC and 1 in San Jose. Host OS will be Windows Server 2012 R2.

I'm going to assume that the 2 in DC are your favored site (I feel dirty typing that) and San Jose is truly only for DR.

We dont have an option of a 3rd site...

That's a shame - if you were running Windows Server 2016 there is a cloud witness.

...so my question is where can I place the FSW if I have to do a planned manual failover?

Quite literally... anywhere. The witness may act as a voter (if it has a vote, depends on if dynamic quorum is turned on, the current state of nodes and votes, etc.) and will be used for arbitration.

Doing a manual failover involves having quorum and a synchronized secondary. Nothing in there requires a witness.

Let's dig a little deeper. The point of having a witness is to help with having enough votes to keep quorum and for arbitration so that you don't split brain. Since you only have the single vote (I'm assuming defaults, here) in San Jose - it's a moot point. If you truly had a DR issue you'd have (from the point of SJC) a single vote out of a possible of 3, which isn't enough to have quorum and thus the cluster service would shut down. It doesn't matter what the AG settings are, that's how WSFC works. You'd have to force quorum, regardless of if you can see to witness or not.

We're not going to get into everything with WSFC but that's the gist of your current setup. Thus, I'd put the witness where it will do the most good, which would be at site DC.

Also, once manual failover is performed, should I be disabling the votes of the nodes in the primary site?

What issue are you trying to solve? If you do a manual failover then it is implied that the replicas are synchronous commit and synchronized also that the cluster has quorum.

If the issue that you are trying to solve is that you don't want a failover to happen or you want the AG to stay in SJC after the failover - then yes, you'll need to adjust the node weight to 0 for the DC replicas. However, if this is done, you're a single point of failure again as any hiccup on the SJC node means your cluster is down (I mean, you removed the voters - we have a bunch of watchers but no voters).

I mentioned about the manual failover with 2 scenarios in mind 1) To failover the AAG to SanJose in case any maintenance activities have to be performed on the DC nodes.

In that case I'd do maintenance on a node at a time and keep it within DC, not immediately go out to SJC. You could also ADD a node at any time in DC just for the purposes of that. If you application or user base is mostly in DC, the added latency to SJC might not be palatable. I can't tell you if it will or won't be.

Summary: First I would try to keep the AG in DC and only use SJC as DR, unless all replicas are Synchronous commit, then it doesn't really matter. Taking away the votes would be beneficial in the case for keeping the cluster up, but still wouldn't help all that much.

2) In case of DR - as you mentioned - a force fail over is the only option as I do not have enough votes to sustain quorum. Is it a better idea to have 2 nodes at SanJose as well?

"It depends™" Again, what are you trying to solve? I wouldn't over-engineer a solution to try and solve every possible edge case or problem. Have your top 3-5 true issues and solve for that.

we do have a 3rd datacenter in Atlanta - can that be used as a third site for the witness?

So you lied to me! I see how it is...

Yes, you can and should absolutely use that location for a FSW :)