Sql-server – I would like to understand better the difference between “sql server stand alone” and “sql server failover instance”

clusteringfailoversql serversql server 2014

AS these question asks:

Are "Always On Failover Cluster Instances" and "SQL Server Failover Clustering" the same thing?

Shouldn't Windows Server Failover Clustering already take care of SQLServer Failover Clustering?

I have some doubts about clusters.

In my first job i remember we had a failover clustering with 4 nodes. There were 3 SQL servers instances on each node and a empty node for failover. If we had a problem with a node, it would failover to that empty node and not cause performance problems with a node that has already a SQL Instance.

Question: If I'm not mistaken we had a SHARED DISK failover cluster. So, every database was inside a single disk that was shared with all nodes. But what happened for an instance to change to the empty node? I mean, there were no SQL instalation there, no instance, but after a failover, the instance was there with all its jobs. Does this mean that the virtual machine went to the empty node?

Now I have a new job and I still can't quite understand te concept of clusters. We have a failover cluster with 2 nodes. Here I created an Always on availability groups environment. It works great ( The only problem is we have no quorum yet but we will get there ). The environment here is different. I had to install 2 SQL Server instances on each node. And this is where I would like to reach.

Does this mean that on that first job we had a sql server failover cluster installation? In this new job I just installed the normal stand-alone installation instance on both nodes.

( I had 0 sql knowledge when I started at my first job so I really cant remember what was the configuration ).

So basically, whats the difference between sql server stand-alone installation and a `sql server failover cluster installation.

and One more question. almost weekly we are suffering from failover. I'm sure node1 is losing connection with node2 and then the quorum is lost. The concept with the quorum is, if node1 lose connection with node2, the quorum can say" oh, they are not communitacing with each other, but node1 is still alive so I will not failover them. Is this right?

Best Answer

First, lets define a few terms:

SQL Server Failover Cluster / Always On Failover Cluster Instance

These terms are interchangeable. An Always On Failover Cluster is just the modern term for a SQL Server Failover Cluster or SQL FCI. An FCI consists of multiple nodes in a Windows Server Failover Cluster with Shared Disks.

SQL Server software is installed on each node, however, the instance that is installed is a special instance that is clustered - a Failover Cluster Instance. The WSFC service manages the failover of the SQL Instance and Shared Disks between nodes to provide high availability and disaster recovery. The SQL Server service for that instance exists on all nodes but is only started and running on the active node of the cluster.

For all intents and purposes, there is only one instance and it moves between nodes along with its underlying Shared Disks. Data synchronisation between the nodes is achieved by using the same underlying disks, so data states between nodes are identical.

SQL Server Always On Availability Group

A SQL Server Always On Availability Group (SQL AG) is a modern solution for high availability and disaster recovery that does not require Shared Disk infrastructure (in fact, even a WSFC cluster is not required in some cases, but that's a different topic).

It still utilises a WSFC cluster to manage failover and availability, however, the synchronisation of data between nodes does not leverage shared disks to ensure data is up to date, instead, a mechanism is used entirely within SQL Server to transport blocks of data between nodes, or Replicas as they're called in an AG.

Each Replica in an AG architecture is an independent SQL Server instance. The underlying machine is joined to the WSFC cluster as a node and the SQL Server instances on each node have the Always On feature enabled.

When an AG is created between these replicas, the data movement mechanisms begin moving data between the replicas for the databases that are joined to the AG. This is very similar to, and built on the same technology as, database mirroring from earlier versions of SQL Server. Data synchronisation is achieved using this 'mirroring' technology.

Standalone SQL Server Instance

A Standalone SQL Server Instance is quite simply a SQL Server instance that is not linked or joined with another through HADR technology. As the name suggests, this instance stands alone.

While it may seem like an Always On architecture features multiple standalone instances, because the servers are joined via the underlying WSFC cluster and the instances are joined via the Availability Group, they are not standalone instances.

Cluster vs Cluster

Often the terms cluster, SQL cluster and Always On cluster are thrown around fairly interchangeably between Always On and FCIs but there is an important distinction - a SQL Cluster is a SQL Server Failover Cluster Instance, an Always On cluster doesn't really exist, its an Always On Availability Group and shoud be referred to as such and a 'cluster' typically refers to the underlying Windows Server Failover Cluster that the FCI or Always On Ag is built upon.

Now to your questions:

Question: If I'm not mistaken we had a SHARED DISK failover cluster. So, every database was inside a single disk that was shared with all nodes. But what happened for an instance to change to the empty node?

A Shared Disk FCI has the software installed on each node, but the instance is moved between nodes as required. Think of the instance as whereever the master database resides. When an FCI fails over, the shared disk hosting master fails over too so when SQL Server is started on the new node, it starts the master database that has been moved and thereby starts up the instance.

I mean, there were no SQL instalation there, no instance, but after a failover, the instance was there with all its jobs. Does this mean that the virtual machine went to the empty node?

Actually, SQL Server was installed on that node. The VM did not move, the instance and all of its DBs, jobs etc moved when the disk hosting the files moved. The SQL Server service on node 2 started up after failover and reconnected to the instance files.

Does this mean that on that first job we had a sql server failover cluster installation? In this new job I just installed the normal stand-alone installation instance on both nodes.

In the new environment, it sounds like they have installed an Always On Availability Group architecture.

One more question. almost weekly we are suffering from failover. I'm sure node1 is losing connection with node2 and then the quorum is lost. The concept with the quorum is, if node1 lose connection with node2, the quorum can say" oh, they are not communitacing with each other, but node1 is still alive so I will not failover them. Is this right?

Quorum is supposed to be the tie-breaker - if both nodes vote that they cannot see the other one, then each would want to take over the primary role (FCI or AG) and this would lead to split-brain. The WSFC cluster won't let this happen and will shut down the cluster instead.

Quorum gives you a deciding vote when you have an even number of nodes. If the cluster service on node 1 cannot communicate with node 2 but it can communicate with the quorum witness (file share or shared disk), then it has two votes (one from the node and one from the quorum witness) and node 2 only has one vote. It is overruled and node 1 stays live. If Node 2 has the two votes then failover is initiated.

It is important to always have an odd number of votes. In later versions of Windows Server, node weight and node votes are adjusted dynamically to try and enforce an odd number of votes, but with only two votes, it cannot enforce this.

UPDATE -

Below is a diagram from Microsoft that helps illustrate the differences:

Always On Technologies

As you can see from the diagram, an Availability Group is built across independent instances of SQL Server installed on servers joined to a WSFC cluster. A Failover Cluster Instance is installed as a single instance of SQL Server across multiple nodes in the WSFC cluster, using Shared Storage.

Even though the binaries and executables for SQL Server are installed on all nodes in an FCI, the instance (system databases, user databases etc) is installed on shared storage and is only active on a single node at a time.

In an AG, both independent instances are up and running at the same time, and the AG Listener (a virtual network name) only listens on the primary replica.

This link provides detailed information about business continuity in SQL Server, including AGs and FCIs.