Sql-server – Checkpoints on secondary replica AlwaysOn AG

availability-groupscheckpointrecoverysql serversql-server-2012

Setup

3 Node Alwayson cluster – 1 sync and 1 async secondary replica – SQL Server 2012

Situation

We are witnessing PageIOLatches when reading from the asynchronous secondary replica. This is mostly caused by the throughput of the SAN that has been throttled. The hosting partner told us that this limitation cannot immediately be alleviated due to hardware constraints.
The primary and synchrounous replica use other SANs with a higher throughput. Although this situation is far from ideal, this is a temporary situation that will be solved soon and not the subject of my question.

When investigating the IO waits we noticed that these occur concurrently with an increase in the number of checkpoint pages/sec.

I was under the impression that checkpoints don’t occur on secondary replica’s in an AG, just as discussed here.

To verify this behaviour, I’ve set up an extended event to monitor the checkpoint events on the asynchronous replica. Just as expected, no checkpoints were captured for this database, nor have I found any checkpoints from other databases that match the pattern.

Next, I’ve created the same extended event on the primary replica and started a perfmon to verify if we could witness the same behaviour. Here we were able to capture (automatic) checkpoints, they happen approx. once per minute. These checkpoints occur simultaneously with the checkpoint pages/sec increase on our secondary (and primary) replica. It seems that checkpoints are being generated on the primary and redone on the secondary replicas. This would mean that checkpoints do occur implicitly on secondary replica's in an AG.

Question

Is my assumption correct that in an AG checkpoints are being generated on the primary replica and redone on all secondary replicas?
And thus, if the database TARGET_RECOVERY_TIME isn't set, the recovery interval setting of the primary replica will dictate the checkpoints on all secondary replicas for these databases.

Best Answer

Is my assumption correct that in an AG checkpoints are being generated on the primary replica and redone on all secondary replicas?

Yes.

A CHECKPOINT is a logged operation, as it's part of SQL Server's recovery strategy. When the database starts up, it needs to know what modified pages (in the log file) haven't been written to the data file - so it can redo those changes before user queries start hitting the database.

I'm going to snag a screenshot from my blog post that looks at certain types of logging activity:

As you can see, the CHECKPOINT "start" and "end" are written to the transaction log.

A simplified view of the way AGs work is that they redo log entries from the primary replica on all secondaries.

Thus, since CHECKPOINTs are logged, they will be replayed on secondary replicas.

As a side note, John Eisbrener usefully pointed out that, if there are other databases on the secondary replica, those CHECKPOINT operations are also going to be picked up by this Perfmon trace.

Related Solutions

Sql-server – AlwaysOn Secondary Replica DB-size

.mdf and .ldf files do not show the last modification date in the way you think it does.

The file-system date reflects the last time the instance started the database. Modifications to data continue while the instance has the database open, without the instance actually updating the file-system dates for the files in use by the database.

Can you add the output of sp_helpdb 'dbname' for both instances to your question? Also add output from:

SELECT d.name
    , mf.name
    , mf.type_desc
    , mf.size * 8192E0 / 1048576 AS FileSizeMB
    , mf.state_desc, mf.max_size * 8192E0 / 1048576 AS MaxFileSizeMB
    , mf.growth  * 8192E0 / 1048576 AS FileGrowthMB
FROM master.sys.databases d
    INNER JOIN master.sys.master_files mf ON d.database_id = mf.database_id
WHERE d.name = 'dbname';

The following query, when run from the database in question, will show the number of megabytes allocated and used:

SELECT sum(total_pages) * 8192E0 / 1048576 as TotalMB
    , sum(used_pages) * 8192E0 / 1048576 as UsedMB
FROM sys.allocation_units;

Sql-server – Change Data Capture for Asynchronous Secondary Replica

Change Tracking and Change Data Capture are, while having similar goals, actually separate features. For AlwaysOn Availability groups, the limitations indicate that you can't use the synchronous Change Tracking feature at all, though the asynchronous Change Data Capture is possible to enable: It's just not a feature Microsoft will support for Active Secondaries.

Generally, things like that aren't supported because there exists a supported way to do so. In this case, enabling CDC on the primary node of an AlwaysOn Availability Group is completely supported, allowing the captured changes to be replicated along to the Active Secondaries, so would likely be the path you should explore for your SSIS needs on the secondaries.

You can read more about the differences between Change Tracking and Change Data Capture here, and more about setting up CDC on the primary replica here.

Best Answer

Related Solutions

Sql-server – AlwaysOn Secondary Replica DB-size

Sql-server – Change Data Capture for Asynchronous Secondary Replica

Related Question