SQL 2012: Indirect Checkpoint Explained !!!

Recently I got an opportunity to present at the SQL Server day organized by www.SQLServerGeeks.com on SQL 2012 Indirect Checkpoints.In this post I would like to explain the same for the benefit of wider audience.

The focus of SQL 2012 has always been to provide a reliable,robust and Highly Available Database Solution to give the customer enough confidence to host their Tier 1 data on SQL Server.

AlwaysON and Indirect Checkpoints are some of the new changes in SQL 2012 which gives you an assurance of High Availability with minimum downtime.

I wouldn’t say indirect Checkpoints as a new feature of SQL 2012 since it is just a change in the algorithm but a very important change as well since it overcomes the flaws of the previous algorithm.

Checkpoints are mainly used to minimize the recovery time interval. At Checkpoints all the dirty buffers are flushed to disk so after the checkpoint is finished we are sure that datafile is consistent and contains all the committed transactions which occurred before checkpoint.

The Recovery Time Objective (RTO) for a given database decides the frequency of the checkpoints.

In SQL Server, we have configuration parameter viz recovery_interval(min) which is used to set your recovery time objective. By default recovery_interval is set to 0 which sets the RTO to 1 min, so SQL Server will internally calculate the frequency of the checkpoints so that recovery should not take more than 1 min to bring the database online.

Let us understand the current checkpoint algorithm and its challenges

Current Checkpoint Algorithm

In our current checkpoint algorithm, based on the recovery interval(min), SQL Server internally calculates the regular intervals in the Tlog at which the checkpoint happens. In other words, checkpoint frequency is dependent on the no. of transactions in the Transaction log.
However a given transaction can insert/update/delete multiple dirty buffers.
So let us say after a Checkpoint we have some transactions which are inserting/updating/deleting large number of dirty buffer and if the crash happens after that, all these dirty buffers needs to be recovered by SQL Server.
The recovery time is mainly dependent on the number of dirty buffers to recover.
However our current checkpoint algorithm doesn’t account for the number of dirty buffer to be recovered.
Hence Recovery Interval(min) cannot be predicted our guaranteed.
Further, the flushing of dirty buffers occurs only at checkpoints and doesn’t occur anywhere in between which results in IO spikes at the time of checkpoint.

Challenges of Current Checkpoint Algorithm

Indirect Checkpoint Algorithm

In the new Indirect Checkpoint Algorithm

We need to first set target_recovery_time which is a new database level setting.
Once the target_recovery_time is set, SQL Server internally calculates target dirty buffer threshold.
As the transaction are logged in the Tlog, the LSN and dirty buffers which are modified by the transaction are tracked in Dirty Page List. So in the indirect checkpoint algorithm, we now track the dirty buffers from each transaction alongwith their LSN.
A new Background process, called Recovery Writer is introduced starting SQL 2012. The job of the recovery writer is to periodically POLL the Dirty Page List to see the following

IF ( No. of Dirty Pages in BPOOL > Target Dirty Buffer Threshold)
{
Flush Dirty Buffers
Move MinRecoveryLSN to the next transaction until which the dirty buffers are flushed
}

Recovery Writer is a new background thread which wakes up periodically based on a timed event to poll the dirty page list
DIRTY_PAGE_POLL is a new system waittype introduced to support timed wait of RECOVERY WRITER thread and if you see it as one of the high wait events in sys.dm_os_wait_stats you can safely ignore it.
With the new recovery writer thread, SQL 2012 ensures at any given point in time there is not more than target dirty buffers in the Buffer Pool, which can easily recovered within target recovery time.
The Checkpoint Background process is now not responsible for flushing the pages but instead is used to calculate the MinRecoveryLSN

You can view the new recovery writer thread in the SQL Instance using the following query

SELECT * FROM sysprocesses where spid=2

With the new indirect Checkpoint algorithm ,

Target Recovery Time can be guaranteed and is predictable since the dirty page list is tracked by SQL Server.
IO spikes occurring at Checkpoints are now smoothen out since the checkpoint process doesn’t flush dirty buffers but the Average IO Workload of the SQL Server will increase.
Setting the Target Recovery Time too small, can cause background recovery writer to run aggressively and thereby increases the IO workload for large OLTP systems.

POINTS TO REMEMBER

CALL TO ACTION

Parikshit Savjani
Premier Field Engineer

Disclaimer: This post is provided “AS IS” with no warranties, and confers no rights

Note: This is my interpretation of the Checkpoint Algorithm. This blog should not be treated as official document or record exposed by Microsoft.

5 Comments

Pingback: Target_recovery_time_in_seconds – Indirect checkpoint explained using extended events « SQLSailor
Pingback: Controlando seus checkpoints com a opção target_recovery_time_in_seconds no SQL Server 2012 | Vladimir M. B. Magalhães – Learn and Share
cmcghee says:

November 25, 2012 at 12:55 am

The normal testing procedure before issuing a DBCC DROPCLEANBUFFERS statement is to issue a CHECKPOINT statement first. If CHECKPOINT no longer is responsible for the dirty buffer flush, how can we produce a cold buffer, ie. trigger a DIRTY_PAGE_POLL by the Recovery Writer, prior to DBCC DROPCLEANBUFFERS for performance testing purposes when using Indirect Checkpoints on a database?

Parikshit says:

November 25, 2012 at 1:12 pm

The Manual CHECKPOINT command fired via TSQL now triggers the Recovery Writer to flush the dirty buffers. So you need to follow the same steps as you would do and it transparently takes care of achieving the same objective.

Pingback: Nova mensagem no Log do SQL Server 2012 | Fabrício Lima

SQL 2012: Indirect Checkpoint Explained !!!

Can number of worker threads in SQL Server exceed max worker threads setting?

Recovering from cryptic errors thrown when importing from PowerPivot

Failed to join the instance to the availability group while configuring AlwaysON

Web Application Analytics using IIS Logs & Powerview

When do we need asymmetric storage with AlwaysON FCI and Availability Storage?

Performance Tuning with Columnstore index using Batch Processing

5 Comments

Leave a Reply Cancel reply

Similar Posts

5 Comments

Leave a Reply Cancel reply