Sql-server – SQL Alwayson availabity group latency

availability-groupssql server

We have sql server 2014 alwayson availability group set up for a database. Synchronous-commit availability mode is used. We ran a few PAL (Performance Analyzer Log) reports to obtain performance data on the primary replica and they showed IO alerts where read and writes were >25ms. The IO alerts occurred for both logical and physical IOs and on drives C:, F: (that has db files) and L: (that has transaction logs). I ran this statement:

SELECT 
    wait_type, waiting_tasks_count, wait_time_ms,
    wait_time_ms/waiting_tasks_count as 'time_per_wait'
FROM 
    sys.dm_os_wait_stats 
WHERE 
    waiting_tasks_count > 0
    and wait_type = 'HADR_SYNC_COMMIT';

and got 12ms for the time_per_wait value. This means the latency between the primary and secondary replicas were 12 ms.

My question: does the IO response time reported by PAL include this 12ms latency between the primary and the secondary replicas?

Thanks

Best Answer

According to PSS that wait type is the time it takes for the primary to receive a notification that the log block has been written on the destination (though not replayed), and after this the wait type changes to WRITELOG and the log block gets written locally.

Logically then any disk waits to write the block remotely must be included in that HADR_SYNC_COMMIT time.

Perhaps if you run PAL on your secondary you will see that it has lower disk latency and so can complete the writes faster. It would be interesting for you to look and see.

Incidentally the MSSQL Tiger Team have a video and demo XE script and Power BI sample that gathers AG sync information and displays it in graphs; if you're interested in exploring that you could give it a try.

Related Solutions

Sql-server – Check the data latency between two Always On Availability Group servers in ASYNC mode

I used this script in a custom report once.

;WITH AG_Stats AS (
            SELECT AGS.name                       AS AGGroupName, 
                   AR.replica_server_name         AS InstanceName, 
                   HARS.role_desc, 
                   Db_name(DRS.database_id)       AS DBName, 
                   DRS.database_id, 
                   AR.availability_mode_desc      AS SyncMode, 
                   DRS.synchronization_state_desc AS SyncState, 
                   DRS.last_hardened_lsn, 
                   DRS.end_of_log_lsn, 
                   DRS.last_redone_lsn, 
                   DRS.last_hardened_time, -- On a secondary database, time of the log-block identifier for the last hardened LSN (last_hardened_lsn).
                   DRS.last_redone_time, -- Time when the last log record was redone on the secondary database.
                   DRS.log_send_queue_size, 
                   DRS.redo_queue_size,
                    --Time corresponding to the last commit record.
                    --On the secondary database, this time is the same as on the primary database.
                    --On the primary replica, each secondary database row displays the time that the secondary replica that hosts that secondary database 
                    --   has reported back to the primary replica. The difference in time between the primary-database row and a given secondary-database 
                    --   row represents approximately the recovery time objective (RPO), assuming that the redo process is caught up and that the progress 
                    --   has been reported back to the primary replica by the secondary replica.
                   DRS.last_commit_time
            FROM   sys.dm_hadr_database_replica_states DRS 
            LEFT JOIN sys.availability_replicas AR 
            ON DRS.replica_id = AR.replica_id 
            LEFT JOIN sys.availability_groups AGS 
            ON AR.group_id = AGS.group_id 
            LEFT JOIN sys.dm_hadr_availability_replica_states HARS ON AR.group_id = HARS.group_id 
            AND AR.replica_id = HARS.replica_id 
            ),
    Pri_CommitTime AS 
            (
            SELECT  DBName
                    , last_commit_time
            FROM    AG_Stats
            WHERE   role_desc = 'PRIMARY'
            ),
    Rpt_CommitTime AS 
            (
            SELECT  DBName, last_commit_time
            FROM    AG_Stats
            WHERE   role_desc = 'SECONDARY' AND [InstanceName] = 'InstanceNameB-PrimaryDataCenter'
            ),
    FO_CommitTime AS 
            (
            SELECT  DBName, last_commit_time
            FROM    AG_Stats
            WHERE   role_desc = 'SECONDARY' AND ([InstanceName] = 'InstanceNameC-SecondaryDataCenter' OR [InstanceName] = 'InstanceNameD-SecondaryDataCenter')
            )
SELECT p.[DBName] AS [DatabaseName], p.last_commit_time AS [Primary_Last_Commit_Time]
    , r.last_commit_time AS [Reporting_Last_Commit_Time]
    , DATEDIFF(ss,r.last_commit_time,p.last_commit_time) AS [Reporting_Sync_Lag_(secs)]
    , f.last_commit_time AS [FailOver_Last_Commit_Time]
    , DATEDIFF(ss,f.last_commit_time,p.last_commit_time) AS [FailOver_Sync_Lag_(secs)]
FROM Pri_CommitTime p
LEFT JOIN Rpt_CommitTime r ON [r].[DBName] = [p].[DBName]
LEFT JOIN FO_CommitTime f ON [f].[DBName] = [p].[DBName]

SQL Server AlwaysON – Troubleshooting Log_Send_Queue_Size Issues

Answering my own question, as future readers will benefit from it :

Seems like we might be hitting Longer latency for SQL Server 2012 database when you use Service Broker, database mirroring, and Availability Groups. This is fixed in SQL server 2012 SP2 CU1. The KB 2976982 has a typo(AlawysOn). So if you are searching by AlwaysON, it wont show up.

After the patch was applied, the issue was fixed.

Best Answer

Related Solutions

Sql-server – Check the data latency between two Always On Availability Group servers in ASYNC mode

SQL Server AlwaysON – Troubleshooting Log_Send_Queue_Size Issues

Related Question