Sql-server – AlwaysOn AG versus FCI

availability-groupsazureclusteringhigh-availabilitysql server

Throughout MS documentation for AlwaysOn Availability Groups (AG) and AlwaysOn Failover Cluster Instances (FCI) I see the following pattern:

  1. FCI is for HA scenarios.
  2. An AG synchronous secondary replica, co-located with the primary, is for HA scenarios.
  3. An AG asynchronous secondary replica, in a different datacenter, is for DR scenarios.

Here is an example MS link discussing this.

Since option #1 and #2 are both for HA scenarios, how do I decide between them?
If MS published costs and RPO/RTO metrics for both #1 and #2, it would be fairly easy to decide which I want.

Or perhaps there is a different way to understand the ROI differences between these options. For example, perhaps option #2 is best suited for VLDB's and option #1 is best suited for very high transaction volumes. I don't know.

So again, what is the selection criteria a DBA uses to choose between options #1 and #2?

To further complicate things, I know that options #1 and #2 can be combined! When is it wise to combine the two options? When is it pointless to combine the two options? I know when these options are combined, the AG no longer supports automatic fail-over. It's interesting trivia, but doesn't answer my question.

Incidentally, I intend on provisioning my final solution into Azure IaaS. If I use an Always On FCI, I will likely create the quasi-SAN using Storage Spaces Direct (S2D).

Update

I've found two articles that give a comparison. The first is MS docs, and the other is Choosing the Right Availability Tech. Both have a chart like this:

╔═════════════════════════════╦══════════════════════════╗
║ FCI                         ║ AG                       ║
╠═════════════════════════════╬══════════════════════════╣
║  * Server Level             ║  * Database Level        ║
╠═════════════════════════════╬══════════════════════════╣
║  * Requires shared storage  ║  * Uses direct           ║
║     (SAN or Storage Spaces  ║     attached storage     ║
║     Direct)                 ║                          ║
╠═════════════════════════════╬══════════════════════════╣
║  * RTO from 30              ║  * RTO typically less    ║
║     seconds to 20 minutes.  ║     than 30 seconds      ║
╠═════════════════════════════╬══════════════════════════╣
║  * RPO: no data loss.       ║  * RPO: ???              ║
╠═════════════════════════════╬══════════════════════════╣
║  * Only Passive Secondaries ║  * Active or Passive     ║
║                             ║     Secondaries          ║
╠═════════════════════════════╬══════════════════════════╣
║  * One SQL Server           ║  * Multiple SQL Server   ║
║     instance/license        ║     instances / licenses ║
╚═════════════════════════════╩══════════════════════════╝

The article didn't comment on AG RPO, but I read elsewhere that there is no data loss when recovering from a synchronous secondary replica. I don't know if that is true and I don't know what the RPO of asynchronous secondary replica might be. In Azure there is an AG resource that requires two domain controllers are created with the AG (regardless if you have your own domain controllers). I don't know if an Azure FCI has the same heavy-handed requirement.

Most significantly – I still don't know why it is valuable to combine the two techniques. I've only read that it "improves" or "maximizes" availability. That is a vague claim IMO.

More trivia: I also spotted a discussion suggesting that an FCI of two nodes should be avoided.

Best Answer

In general, FCI is more beneficial compared to AG because of various reasons already mentioned above. Of course, 2-node FCI is completely great but should not be built on top of storage spaces direct since S2D is known to be problematic in less than 4 nodes deployments (source).

For Azure, I would rather stick to a VSAN that is designed to work in 2 or 3 nodes scenarios like (example).

Here is a bit more information on FCI vs AG that you might find useful too