SQL Server – Monitoring Availability Groups

availability-groupssql server

I'm trying to find what query (or queries) need to be run in order to determine if there is any sort of a problem with an availability group. I have been using:

SELECT @@SERVERNAME AS 'primary_replica', cs.database_name, rs.synchronization_health_desc
FROM sys.dm_hadr_database_replica_states rs
join sys.dm_hadr_database_replica_cluster_states cs on rs.replica_id = 
    cs.replica_id and rs.group_database_id = cs.group_database_id
WHERE rs.is_local = 1

It turns out that the synchronization health (select * from sys.dm_hadr_availability_group_states) can be NOT_HEALTHY, but the query above does not show any problem with the replica. I have tried looking through all of the various DMs that contain information, but it is a maze or words and I can't find my way to the SQL cheese that must be there somewhere.

What is the one query that will indicate that there is a problem that needs to be addressed? I don't need it to be specific or to designate the cause, I just need to know that there is a problem. I know that during failover some of the states will be NOT_HEALTHY, and we account for that by requiring x minutes of unhealthiness before sending an alert.

Best Answer

Following is a SQL query that can be run against all nodes in the cluster to check all of the AG-related health states. The output could be formatted differently depending on your needs, but this one returns either a list of the objects that are not healthy, or a row that includes the server name and that all objects are healthy.

DECLARE @HealthStates AS TABLE (objectname VARCHAR(50), [type] VARCHAR(20), health_desc tinyint)

/*    Collect local database replica states    */
INSERT INTO @HealthStates
    SELECT cs.[database_name], 'database_replica', rs.synchronization_health
    FROM sys.dm_hadr_database_replica_states rs
    join sys.dm_hadr_database_replica_cluster_states cs ON rs.replica_id = cs.replica_id and rs.group_database_id = cs.group_database_id
    WHERE rs.is_local = 1

/*    Collect local AG states   */
INSERT INTO @HealthStates
    SELECT ag.[name], 'availability_group', gs.synchronization_health
    FROM sys.dm_hadr_availability_group_states gs
    join sys.availability_groups_cluster ag ON gs.group_id = ag.group_id
    WHERE gs.primary_replica = @@SERVERNAME

/*    Collect replica states   */
INSERT INTO @HealthStates
    SELECT cs.replica_server_name, 'server_replica', rs.synchronization_health
    FROM sys.dm_hadr_availability_replica_states rs
    join sys.availability_replicas cs ON rs.replica_id = cs.replica_id and rs.group_id = cs.group_id

/*    States to grid   */
--SELECT objectname, [type], health_desc FROM @HealthStates

/*    Test for not healthy   */
IF EXISTS (SELECT health_desc from @HealthStates where health_desc <> 2)
BEGIN
    SELECT objectname, type, 'NOT_HEALTHY' as health_desc from @HealthStates where health_desc <> 2
END
ELSE
BEGIN
    SELECT @@SERVERNAME AS objectname, 'All Objects' AS type, 'HEALTHY' as health_desc
END