Latches are lightweight synchronization primitives that are used by the SQL Server engine to guarantee consistency of in-memory structures including; index, data pages and internal structures
Latch waits are one of the possible causes of slowness in SQL Server which can limit the throughput (tps) of OLTP application.
Whenever we experience slowness in performance of SQL Server, one of the few DMVs we are interested to look is sys.dm_os_wait_stats.Cumulative wait information is tracked by SQL Server and can be accessed using the Dynamic Management View (DMW) sys.dm_os_wait_stats. SQL Server employs three latch wait types as defined by the corresponding “wait_type” in the sys.dm_os_wait_stats DMV:
Buffer (BUF) latch: used to guarantee consistency of index and data pages for user objects. They are also used to protect access to data pages that SQL Server uses for system objects. For example pages that manage allocations are protected by buffer latches. These include the Page Free Space (PFS), Global Allocation Map (GAM), Shared Global Allocation Map (SGAM) and Index Allocation Map (IAM) pages. Buffer latches are reported in sys.dm_os_wait_stats with a wait_type of PAGELATCH_*.
Non-buffer (Non-BUF) latch: used to guarantee consistency of any in-memory structures other than buffer pool pages. Any waits for non-buffer latches will be reported as a wait_type of LATCH_*.
IO latch: a subset of buffer latches that guarantee consistency of the same structures protected by buffer latches when these structures require loading into the buffer pool with an I/O operation. IO latches prevent another thread loading the same page into the buffer pool with an incompatible latch. Associated with a wait_type of PAGEIOLATCH_*.
Troubleshooting Buffer Latch contention is very nicely documented in the whitepaper here by SQLCAT Team
If you see significant PAGEIOLATCH waits it means that SQL Server is waiting on the I/O subsystem. While a certain amount of PAGEIOLATCH waits is expected and normal behavior, if the average PAGEIOLATCH wait times are consistently above 10 milliseconds (ms) you should investigate why the I/O subsystem is under pressure.
However, there is less documentation available on Non-Buffer latch but it is scattered and hence I thought it would be worth to compose this blog
- Check sys.dm_os_wait_stats to confirm whether LATCH_XX is the major wait on the SQL Server
SELECT * from sys.dm_os_wait_stats Order by wait_time_ms desc
- If LATCH_XX is found to be major wait, next check sys.dm_os_latch_stats which is major non-buffer latch on which SQL Server is waiting on
SELECT latch_class, wait_time_ms / 1000.0 AS [Wait In sec], waiting_requests_count AS [Count of wait], 100.0 * wait_time_ms / SUM (wait_time_ms) OVER() AS Percentage, FROM sys.dm_os_latch_stats WHERE latch_class NOT IN ('BUFFER') AND wait_time_ms > 0
- Monitor master.dbo.sysprocesses for waitresource column
Based on the above data collection, you may find the one of the following possible cause of excessive non-buffer page latch (in my experience).
- Is there any profiler trace in particular client side tracing running on the server
When you run SQL Profiler against an instance of SQL Server 2000 or of SQL Server 2005, each active SPID of the instance requires an exclusive latch (LATCH_EX) to write the event to SQL Profiler. The SPID writes the event to the local memory buffer first. Then, the event is transferred to SQL Profiler. If the local memory buffer is full, the SPID discards the event. After the events in the local memory buffer are transferred to SQL Profiler, the SPID can write new events to the local memory buffer. If other SPIDs are waiting to write events to SQL Profiler, these SPIDs wait in a queue.
You can detect this case when you will see TRACE,TRACE_CONTROLLER,TRACE_IO_SYNC as one of the major latch waits, further in master.dbo.sysprocesses you will see TRACE related waitresource column
- Excessive Parallelism
While parallelism is not the direct cause of NON-BUF Latches, however whenever we have excessive parallelism on muti-core processors machine (generally more than 16+ logical processors) , almost all the queries qualifies for the parallel plans.While LATCH_XX latches are used to synchronize internal structures used in parallel plan.
You can detect this case when you see ACCESS_METHODS_SCAN_RANGE_GENERATOR, ACCESS_METHODS_KEY_RANGE_GENERATOR as one of the major waittypes in sys.dm_os_latch_stats.
Further you may also see high CXPACKETS in sys.dm_os_wait_stats alongwith LATCH_EX latches as major waits. While waitresource column in master.dbo.sysprocesses might show waits on PARALLEL_PAGE_SUPPLIER.
- Very Heavy insert/update/deletes on Heap/BLOB data structures on the SQL instance
When there is heavy inserts/updates/deletes on heap/BLOB data structures, the internal memory structures which is used for allocation/deallocation of the pages to heap are synchronized using Non-BUF Page Latch.In one of my customer site, they had an audit table which was a heap table which was inserted heavily since every transaction in the application is logged in the audit table which causes excessive waits on LATCH_EX waits.
You can detect this case when you see ALLOC_FREESPACE_CACHE, ALLOC_EXTENT_CACHE, ALLOC_CREATE_FREESPACE_CACHE as one of the major waittype in sys.dm_os_latch_stats
While waitresource column in master.dbo.sysprocesses might show waits on IDES
- Excessive Auto-grow or Autoshrink Operation
Due to insufficient sizing or capacity planning, you might experience frequent Autogrow or if you Autoshrink turned ON, you might see frequent AUTOSHRINK. During growing or shrinking, SQL Server acquires FCB, FGCB_ALLOC latch to synchronize the access to the filegroup.You can detect this case when you see FCB, FGCB_ALLOC as one of the major waittype in sys.dm_os_latch_stats.
In this case, the waitresource columns in master.dbo.sysprocesses might show waits on FCB and FGCB_ALLOC
Please Note: The above information is purely based on my research and experience. The content should be treated AS IS and under no circumstance should be treated as offical documentation from Microsoft
Premier Field Engineer
Good Blog. Fantastic
Thanks a lot! In my case I was seeing high wait times on types: PREEMPTIVE_OS_FLUSHFILEBUFFERS and LATCH_SH which was caused by a database file autogrow of 1MB :). After changing the autogrow to 1024MB the issues did not reoccur.