SQL Server – Overcoming Redo Lag on AG Readable Secondaries

availability-groupssql serversql-server-2016

I have a quite busy (1000+ batch requests per second, 350+ workers on average) system built on sql server 2016. It's an availability group with 3 synchronous replicas.
I'd like to redirect read requests on secondaries, but due to the redo lag (and the way the app works), I'm concerned about the fetched data being stale.

I know I can get the last hardened lsn and last redone lsn from sys.dm_hadr_database_replica_states – but is there any way to wait for the redo to catch up avoiding a while loop?

Thanks!
Gabriele

Best Answer

It sounds like you have a few different questions in here:

Could a query be modified to check to see if redo is current on the underlying data is done, and if not, stall execution until redo catches up? This seems like a really dangerous idea because when replication gets behind, you'll have more and more queries using up worker threads, waiting for replication to catch up. I can easily see worker thread exhaustion hitting here and causing threadpool waits because it's essentially the same scenario that blocking causes.

Could queries be redirected to the most current secondary? Yes, by using a load balancer between your app and the SQL Servers. You can build logic into load balancers to run periodic health checks, and when a server fails the load check, it's taken out of the available connection list until it's able to pass health checks again. This is a pretty common strategy for web servers, but fairly rare for database servers (since it's a lot of work to build out.)

Can redo be done faster? Yes, test to see whether serial or parallel redo is a better fit for your workloads, and make sure you're getting the one you want. Microsoft has blogged about the gotchas with parallel redo, including gems like only the first databases in your AG (by database ID) get parallel redo.

Related Solutions

Sql-server – The curious case of HADR_SYNC_COMMIT waits

First the description of the wait event that your question is regarding is:

Waiting for transaction commit processing for the synchronized secondary databases to harden the log. This wait is also reflected by the Transaction Delay performance counter. This wait type is expected for synchronized availability groups and indicates the time to send, write, and acknowledge log to the secondary databases.

https://msdn.microsoft.com/en-us/library/ms179984.aspx

Digging into the mechanics of this wait you have the log blocks being transmitted and hardened but recovery not completed on the remote servers. With this being the case and given that you added additional replicas it stands to reason that your HADR_SYNC_COMMIT may increase due to the increase in bandwidth requirements. In this case Aaron Bertrand is exactly correct in his comments on the question.

Source: http://blogs.msdn.com/b/psssql/archive/2013/04/26/alwayson-hadron-learning-series-hadr-sync-commit-vs-writelog-wait.aspx

Digging into the second part of your question about how this wait could be related to application slowdowns. This I believe is a causality issue. You are looking at your waits increasing and a recent user complaint and drawing the conclusion potentially incorrectly that the two have a relationship when this may not be the case at all. The fact that you added tempdb files and your application became more responsive to me indicates that you may have had some underlying contention issues that could have been exacerbated by the additional overhead of the implicit snapshot isolation level overhead when a database is in an availability group. This may have had little or nothing to do with your HADR_SYNC_COMMIT waits.

If you wanted to test this you could utilize an extended event trace that looks at the hadr_db_commit_mgr_update_harden XEvent on your primary replica and get a baseline. Once you have your baseline you can then add your replicas back in one at a time and see how the trace changes. I would strongly encourage you to use a file that resides on a volume that does not contain any databases and set a rollover and maximum size. Please adjust the duration filter as needed to gather events that match up with your waits so that you can further troubleshoot and correlate this with any other teams that need to be involved.

CREATE EVENT SESSION [HADR_SYNC_COMMIT-Monitor] ON SERVER  -- Run this on the primary replica 
ADD EVENT sqlserver.hadr_db_commit_mgr_update_harden(
    WHERE ([delay]>(10))) -- I strongly encourage you to use the delay filter to avoid getting too many events back, this is measured in milliseconds
ADD TARGET package0.event_file(SET filename=N'<YourFilePathHere>')
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)
GO

SQL Server – Why sys.fn_hadr_backup_is_preferred_replica is Slow and Alternatives

The problem I'm experiencing is that the function takes a long time to run (about 3 seconds per database)

It's possible that this is among a growing list of dynamic management objects that don't support predicate push-down, so you will probably notice higher delays as the number of AGs and/or AG databases increases on any given replica.

And it's possible that's exactly what was fixed, as @Kin pointed out, in Connect #783011; sadly the Connect item doesn't indicate where exactly you can find the fix.

Based on some quick searches (and the date of the comment), I think you will want at least 2012 SP1 CU8, which (via KB #2918791) fixed a regression introduced in 2012 SP1 CU7 (via KB #2887115).

I don't see any evidence in later builds that suggest this function has been modified after 2012 SP1 CU8, and the modification is possibly related to the regression that was introduced.

But I am a huge proponent of using the most recent service pack branch:

SP2 is the only currently supported branch; RTM and SP1 have been retired, so if you're still on those, it's time to move;
the latest CU (currently that is SP2 CU #8);
if your applications use connection pooling, you may also be interested in this on-demand hotfix.

can anyone recommend an alternative that is faster?

If you look at the source for sys.fn_hadr_backup_is_preferred_replica, you can see that there is some convoluted logic and waterfall variable population that you could probably short circuit if you are not using mirroring or log shipping, by writing your own, cleaned-up version. But then you get to calls to forbidden system functions like RetrieveDbReplicaState(), which you can't call yourself, and you realize that there isn't a way to rewrite this function and replicate its full functionality

So, your best bet is to apply whatever fix Microsoft has supplied in order to optimize the function. If you still experience slowness after moving to the latest build, you can file another Connect item, or you can raise a case with CSS directly (just be aware that if you have 10,000 databases in your AG, they may just tell you to stop doing that).

Best Answer

Related Solutions

Sql-server – The curious case of HADR_SYNC_COMMIT waits

SQL Server – Why sys.fn_hadr_backup_is_preferred_replica is Slow and Alternatives

Related Question