Sql-server – The curious case of HADR_SYNC_COMMIT waits

sql serversql server 2014

We are noticing an interesting pattern for HADR_SYNC_COMMIT waits in our environment. We have a three replica; one primary, one sync secondary and one async secondary in a datacenter and we just added three more ASYNC replicas in another datacenter (~2400 miles apart).

Ever since, we have started to notice an enormous increase in HADR_SYNC_COMMIT waits. When we look at the active sessions, we see a bunch of COMMIT TRANSACTION queries waiting on the SYNC replica

From the screenshot, we can clearly see there is a jump in HADR_SYNC_COMMIT wait on June 29, and we eventually dropped 'two' of the three async replica in the remote datacenter sometime in the noon on July 1st. That dropped the wait times considerably along with it.

What we have checked so far – Log send queue, Redo queue, last hardened time and last commit time on the remote replicas. We have continuous bursts of small transactions during the business hours, and therefore the send queues are pretty small at a given timestamp (anywhere between 60KB and 1MB).
The remote replicas are almost in sync, there is very little difference between the last commit time and last hardened time for any individual lsn on the replicas.

The network pipe is 10G and we modified the transmit buffer size from 256 megs to 2 gigs, this was made under the assumption that the network was dropping packets and re-transmitting them; either way that didn’t seem to help much.

So, I’m wondering what does the ASYNC replicas have to do with HADR_SYNC_COMMIT waits? Shouldn’t the SYNC replica depend alone on this wait type, what am I missing here?

Best Answer

First the description of the wait event that your question is regarding is:

Waiting for transaction commit processing for the synchronized secondary databases to harden the log. This wait is also reflected by the Transaction Delay performance counter. This wait type is expected for synchronized availability groups and indicates the time to send, write, and acknowledge log to the secondary databases.

https://msdn.microsoft.com/en-us/library/ms179984.aspx

Digging into the mechanics of this wait you have the log blocks being transmitted and hardened but recovery not completed on the remote servers. With this being the case and given that you added additional replicas it stands to reason that your HADR_SYNC_COMMIT may increase due to the increase in bandwidth requirements. In this case Aaron Bertrand is exactly correct in his comments on the question.

Source: http://blogs.msdn.com/b/psssql/archive/2013/04/26/alwayson-hadron-learning-series-hadr-sync-commit-vs-writelog-wait.aspx

Digging into the second part of your question about how this wait could be related to application slowdowns. This I believe is a causality issue. You are looking at your waits increasing and a recent user complaint and drawing the conclusion potentially incorrectly that the two have a relationship when this may not be the case at all. The fact that you added tempdb files and your application became more responsive to me indicates that you may have had some underlying contention issues that could have been exacerbated by the additional overhead of the implicit snapshot isolation level overhead when a database is in an availability group. This may have had little or nothing to do with your HADR_SYNC_COMMIT waits.

If you wanted to test this you could utilize an extended event trace that looks at the hadr_db_commit_mgr_update_harden XEvent on your primary replica and get a baseline. Once you have your baseline you can then add your replicas back in one at a time and see how the trace changes. I would strongly encourage you to use a file that resides on a volume that does not contain any databases and set a rollover and maximum size. Please adjust the duration filter as needed to gather events that match up with your waits so that you can further troubleshoot and correlate this with any other teams that need to be involved.

CREATE EVENT SESSION [HADR_SYNC_COMMIT-Monitor] ON SERVER  -- Run this on the primary replica 
ADD EVENT sqlserver.hadr_db_commit_mgr_update_harden(
    WHERE ([delay]>(10))) -- I strongly encourage you to use the delay filter to avoid getting too many events back, this is measured in milliseconds
ADD TARGET package0.event_file(SET filename=N'<YourFilePathHere>')
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)
GO

Related Solutions

SQL Server – Always On DDL Operations Explained

As Brent Ozar mentioned in the comment section that this is not a simple task to find wait type (and what is causing the wait) between primary and secondary with correlation to time. I am answering your question about finding the source. I modified extended event trace definition given in the blog post you mentioned. Removed the where clause so you can capture all the sessions that is causing wait.

Added few more actions to capture more information. For example:

sqlserver.client_hostname
sqlserver.plan_handle
sqlserver.session_nt_username
sqlserver.sql_text

Here is the full definition.

CREATE event session [redo_wait_info] ON server ADD event sqlos.wait_info( action(package0.event_sequence,sqlos.scheduler_id,sqlserver.client_hostname,sqlserver.database_id,sqlserver.plan_handle,sqlserver.session_id,sqlserver.session_nt_username,sqlserver.sql_text) ) ADD target package0.event_file(SET filename=N'C:\Redo_Wait_Info.xel',
  max_file_size=(50), 
  max_rollover_files=(100)) WITH (max_memory=4096 kb, 
event_retention_mode=allow_multiple_event_loss, 
max_dispatch_latency=120 seconds, 
max_event_size=0 kb, 
memory_partition_mode=none, 
track_causality=OFF, 
startup_state=ON)
GO

Sql-server – sys.dm_hadr_database_replica_states, understanding the different *_time columns

For a starter, here's the Microsoft Docs Page

As an aside, I suggest you do some reading into the mechanics of an availability group, purely so you can understand my answer. (Microsoft Docs Page)

As for the answer to your question, as normally in the case with questions around an RDBMS, it depends.

In this case is depends on what you mean by "...for data to appear"

Let's have a look at the columns in your question:

last_sent_time,
last_received_time,
last_hardened_time,
last_redone_time,
last_commit_time

It is also worth noting that the System DMV you are referring to is at the database level

last_sent_time

This time indicates the last time that the PRIMARY sent a Log Block to the available secondaries. This is the start of the data synchronisation process.

last_received_time

This indicates the last time that the secondary received a log block.

last_hardened_time

This indicates the last time that the secondary cached the recieved log block data to disk.

last_redone_time

This is the time that the last LSN was redone on the target database.

last_commit_time

This is the time of the last commit record was redone and reported back to the primary.

Summary

Of the above, there are various entry-points of the data into the secondary systems.

The data first enters the server into memory at last_received_time

The data first enters the server on disk at last_hardened_time

The data first enters the database data files at last_redone_time

The data first becomes committed and available for reading by queries (outside of strange NOLOCK situations) at last_commit_time

I suspect that the answer to your question is the latter of the 4 concepts. There is a small overhead to the time in this column however, due to the transmission time of the data between the SECONDARY and PRIMARY. This is likely to be unimportant in calculations for determining the speed of data throughput though.