Recently I got an opportunity to present at the SQL Server day organized by www.SQLServerGeeks.com on SQL 2012 Indirect Checkpoints.In this post I would like to explain the same for the benefit of wider audience.
The focus of SQL 2012 has always been to provide a reliable,robust and Highly Available Database Solution to give the customer enough confidence to host their Tier 1 data on SQL Server.
AlwaysON and Indirect Checkpoints are some of the new changes in SQL 2012 which gives you an assurance of High Availability with minimum downtime.
I wouldn’t say indirect Checkpoints as a new feature of SQL 2012 since it is just a change in the algorithm but a very important change as well since it overcomes the flaws of the previous algorithm.
Checkpoints are mainly used to minimize the recovery time interval. At Checkpoints all the dirty buffers are flushed to disk so after the checkpoint is finished we are sure that datafile is consistent and contains all the committed transactions which occurred before checkpoint.
The Recovery Time Objective (RTO) for a given database decides the frequency of the checkpoints.
In SQL Server, we have configuration parameter viz recovery_interval(min) which is used to set your recovery time objective. By default recovery_interval is set to 0 which sets the RTO to 1 min, so SQL Server will internally calculate the frequency of the checkpoints so that recovery should not take more than 1 min to bring the database online.
Let us understand the current checkpoint algorithm and its challenges
Current Checkpoint Algorithm
- In our current checkpoint algorithm, based on the recovery interval(min), SQL Server internally calculates the regular intervals in the Tlog at which the checkpoint happens. In other words, checkpoint frequency is dependent on the no. of transactions in the Transaction log.
- However a given transaction can insert/update/delete multiple dirty buffers.
- So let us say after a Checkpoint we have some transactions which are inserting/updating/deleting large number of dirty buffer and if the crash happens after that, all these dirty buffers needs to be recovered by SQL Server.
- The recovery time is mainly dependent on the number of dirty buffers to recover.
- However our current checkpoint algorithm doesn’t account for the number of dirty buffer to be recovered.
- Hence Recovery Interval(min) cannot be predicted our guaranteed.
- Further, the flushing of dirty buffers occurs only at checkpoints and doesn’t occur anywhere in between which results in IO spikes at the time of checkpoint.
Challenges of Current Checkpoint Algorithm
Indirect Checkpoint Algorithm
In the new Indirect Checkpoint Algorithm
- We need to first set target_recovery_time which is a new database level setting.
- Once the target_recovery_time is set, SQL Server internally calculates target dirty buffer threshold.
- As the transaction are logged in the Tlog, the LSN and dirty buffers which are modified by the transaction are tracked in Dirty Page List. So in the indirect checkpoint algorithm, we now track the dirty buffers from each transaction alongwith their LSN.
- A new Background process, called Recovery Writer is introduced starting SQL 2012. The job of the recovery writer is to periodically POLL the Dirty Page List to see the following
IF ( No. of Dirty Pages in BPOOL > Target Dirty Buffer Threshold)
Flush Dirty Buffers
Move MinRecoveryLSN to the next transaction until which the dirty buffers are flushed
- Recovery Writer is a new background thread which wakes up periodically based on a timed event to poll the dirty page list
- DIRTY_PAGE_POLL is a new system waittype introduced to support timed wait of RECOVERY WRITER thread and if you see it as one of the high wait events in sys.dm_os_wait_stats you can safely ignore it.
- With the new recovery writer thread, SQL 2012 ensures at any given point in time there is not more than target dirty buffers in the Buffer Pool, which can easily recovered within target recovery time.
- The Checkpoint Background process is now not responsible for flushing the pages but instead is used to calculate the MinRecoveryLSN
You can view the new recovery writer thread in the SQL Instance using the following query
SELECT * FROM sysprocesses where spid=2
With the new indirect Checkpoint algorithm ,
- Target Recovery Time can be guaranteed and is predictable since the dirty page list is tracked by SQL Server.
- IO spikes occurring at Checkpoints are now smoothen out since the checkpoint process doesn’t flush dirty buffers but the Average IO Workload of the SQL Server will increase.
- Setting the Target Recovery Time too small, can cause background recovery writer to run aggressively and thereby increases the IO workload for large OLTP systems.
POINTS TO REMEMBER
CALL TO ACTION
Premier Field Engineer
Disclaimer: This post is provided “AS IS” with no warranties, and confers no rights
Note: This is my interpretation of the Checkpoint Algorithm. This blog should not be treated as official document or record exposed by Microsoft.