Sql-server – Checkpoints on secondary replica AlwaysOn AG

availability-groupscheckpointrecoverysql serversql-server-2012

Setup

3 Node Alwayson cluster – 1 sync and 1 async secondary replica – SQL Server 2012

Situation

We are witnessing PageIOLatches when reading from the asynchronous secondary replica. This is mostly caused by the throughput of the SAN that has been throttled. The hosting partner told us that this limitation cannot immediately be alleviated due to hardware constraints.
The primary and synchrounous replica use other SANs with a higher throughput. Although this situation is far from ideal, this is a temporary situation that will be solved soon and not the subject of my question.

When investigating the IO waits we noticed that these occur concurrently with an increase in the number of checkpoint pages/sec.

enter image description here

I was under the impression that checkpoints don’t occur on secondary replica’s in an AG, just as discussed here.

To verify this behaviour, I’ve set up an extended event to monitor the checkpoint events on the asynchronous replica. Just as expected, no checkpoints were captured for this database, nor have I found any checkpoints from other databases that match the pattern.

Next, I’ve created the same extended event on the primary replica and started a perfmon to verify if we could witness the same behaviour. Here we were able to capture (automatic) checkpoints, they happen approx. once per minute. These checkpoints occur simultaneously with the checkpoint pages/sec increase on our secondary (and primary) replica. It seems that checkpoints are being generated on the primary and redone on the secondary replicas. This would mean that checkpoints do occur implicitly on secondary replica's in an AG.

Question

Is my assumption correct that in an AG checkpoints are being generated on the primary replica and redone on all secondary replicas?
And thus, if the database TARGET_RECOVERY_TIME isn't set, the recovery interval setting of the primary replica will dictate the checkpoints on all secondary replicas for these databases.

Best Answer

Is my assumption correct that in an AG checkpoints are being generated on the primary replica and redone on all secondary replicas?

Yes.

A CHECKPOINT is a logged operation, as it's part of SQL Server's recovery strategy. When the database starts up, it needs to know what modified pages (in the log file) haven't been written to the data file - so it can redo those changes before user queries start hitting the database.

I'm going to snag a screenshot from my blog post that looks at certain types of logging activity:

Screenshot of log file activity

As you can see, the CHECKPOINT "start" and "end" are written to the transaction log.

A simplified view of the way AGs work is that they redo log entries from the primary replica on all secondaries.

Thus, since CHECKPOINTs are logged, they will be replayed on secondary replicas.


As a side note, John Eisbrener usefully pointed out that, if there are other databases on the secondary replica, those CHECKPOINT operations are also going to be picked up by this Perfmon trace.