I'm trying to find what query (or queries) need to be run in order to determine if there is any sort of a problem with an availability group. I have been using:
SELECT @@SERVERNAME AS 'primary_replica', cs.database_name, rs.synchronization_health_desc
FROM sys.dm_hadr_database_replica_states rs
join sys.dm_hadr_database_replica_cluster_states cs on rs.replica_id =
cs.replica_id and rs.group_database_id = cs.group_database_id
WHERE rs.is_local = 1
It turns out that the synchronization health (select * from sys.dm_hadr_availability_group_states
) can be NOT_HEALTHY, but the query above does not show any problem with the replica. I have tried looking through all of the various DMs that contain information, but it is a maze or words and I can't find my way to the SQL cheese that must be there somewhere.
What is the one query that will indicate that there is a problem that needs to be addressed? I don't need it to be specific or to designate the cause, I just need to know that there is a problem. I know that during failover some of the states will be NOT_HEALTHY, and we account for that by requiring x minutes of unhealthiness before sending an alert.
Best Answer
Following is a SQL query that can be run against all nodes in the cluster to check all of the AG-related health states. The output could be formatted differently depending on your needs, but this one returns either a list of the objects that are not healthy, or a row that includes the server name and that all objects are healthy.