How to measure the delay between secondary node to catch up with the primary node?
I am using red gate to monitor the databases and this is what mentioned there
Guideline values: To estimate how long a secondary replica will take to catch up with the primary replica, divide the log send queue by the rate of log bytes received.
Based on this graph can you advise me how can I measure this delay ? it is not clear for me how to divide these 2 values
If there is another way to measure the delay I will appreciate to share
Best Answer
I'm not experienced with RedGate's monitoring tools, but SSMS has an Availability Group Dashboard out-of-the-box that I've always found particularly helpful with basic metrics like this. Specifically the metric in the dashboard that would be helpful to you is called the Estimated Recovery Time (seconds):
I also found the Redo Queue Size (KB):
And the Redo Rate (KB/sec) particularly helpful in monitoring the health of my secondary replicas:
If you want to access this information more natively so you can collect the metrics over time, the above Availability Group Dashboard just leverages the following DMVs:
You can find more information on querying the DMVs for this information in SQLPerformance's Monitoring Availability Group Replica Synchronization. This is the query from the article (with one additional calculated column to get you the Estimated Recovery Time):
You can create a SQL job to routinely log this information to a table so you have a historical comparison to achieve your goals with. (This is essentially what RedGate's Performance Monitor is likely doing under the hood.)
The above article also briefly mentions a third way to monitor these metrics via Perfmon Counters.
And finally a third and equally effective way to log your Availability Groups health metrics is via the built-in Extended Events.