Troubleshooting Non-Buffer Latch Contention: LATCH_XX

Latches are lightweight synchronization primitives that are used by the SQL Server engine to guarantee consistency of in-memory structures including; index, data pages and internal structures

Latch waits are one of the possible causes of slowness in SQL Server which can limit the throughput (tps) of OLTP application.

Whenever we experience slowness in performance of SQL Server, one of the few DMVs we are interested to look is sys.dm_os_wait_stats.Cumulative wait information is tracked by SQL Server and can be accessed using the Dynamic Management View (DMW) sys.dm_os_wait_stats. SQL Server employs three latch wait types as defined by the corresponding “wait_type” in the sys.dm_os_wait_stats DMV:

Buffer (BUF) latch: used to guarantee consistency of index and data pages for user objects. They are also used to protect access to data   pages    that SQL Server uses for system objects. For example pages that manage allocations are protected by buffer latches. These include the Page Free Space (PFS), Global Allocation Map (GAM), Shared Global Allocation Map (SGAM) and Index Allocation Map (IAM) pages. Buffer latches are reported in sys.dm_os_wait_stats with a wait_type of PAGELATCH_*.

Non-buffer (Non-BUF) latch: used to guarantee consistency of any in-memory structures other than buffer pool pages. Any waits for non-buffer latches will be reported as a wait_type of LATCH_*.

IO latch: a subset of buffer latches that guarantee consistency of the same structures protected by buffer latches when these structures require loading into the buffer pool with an I/O operation. IO latches prevent another thread loading the same page into the buffer pool with an incompatible latch. Associated with a wait_type of PAGEIOLATCH_*.

Troubleshooting Buffer Latch contention is very nicely documented in the whitepaper here by SQLCAT Team

If you see significant PAGEIOLATCH waits it means that SQL Server is waiting on the I/O subsystem. While a certain amount of PAGEIOLATCH waits is expected and normal behavior, if the average PAGEIOLATCH wait times are consistently above 10 milliseconds (ms) you should investigate why the I/O subsystem is under pressure.

However, there is less documentation available on Non-Buffer latch but it is scattered and  hence I thought it would be worth to compose this blog

  1.  Check sys.dm_os_wait_stats to      confirm whether LATCH_XX is the major wait on the SQL Server
    SELECT * from sys.dm_os_wait_stats
    Order by wait_time_ms desc
  2. If LATCH_XX is found to be  major wait, next check sys.dm_os_latch_stats which is major non-buffer      latch on which SQL Server is waiting on
    SELECT   latch_class, wait_time_ms / 1000.0 AS [Wait In sec],
    waiting_requests_count AS [Count of wait],
    100.0 * wait_time_ms / SUM (wait_time_ms) OVER() AS Percentage,
    FROM sys.dm_os_latch_stats
    WHERE latch_class NOT IN ('BUFFER')
    AND wait_time_ms > 0
  3. Monitor master.dbo.sysprocesses  for waitresource column

Based on the above data collection, you may find the one of the following possible cause of excessive non-buffer page latch (in my experience).

  • Is there any profiler trace in particular client side tracing running on the server

When you run SQL Profiler against an instance of SQL Server 2000 or of SQL Server 2005, each active SPID of the instance requires an exclusive latch (LATCH_EX) to write the event to SQL Profiler. The SPID writes the event to the local memory buffer first. Then, the event is transferred to SQL Profiler. If the local memory buffer is full, the SPID discards the event. After the events in the local memory buffer are transferred to SQL Profiler, the SPID can write new events to the local memory buffer. If other SPIDs are waiting to write events to SQL Profiler, these SPIDs wait in a queue.

You can detect this case when you will see TRACE,TRACE_CONTROLLER,TRACE_IO_SYNC as one of the major latch waits, further in master.dbo.sysprocesses you will see TRACE related waitresource column

  • Excessive  Parallelism

While parallelism is not the direct cause of NON-BUF Latches, however whenever we have excessive parallelism on muti-core processors machine (generally more than 16+ logical processors) , almost all the queries qualifies for the parallel plans.While LATCH_XX latches are used to synchronize internal structures used in parallel plan.

You can detect this case when you see ACCESS_METHODS_SCAN_RANGE_GENERATOR, ACCESS_METHODS_KEY_RANGE_GENERATOR as one of the major waittypes in sys.dm_os_latch_stats.

Further you may also see high CXPACKETS in sys.dm_os_wait_stats alongwith LATCH_EX latches as major waits. While waitresource column in master.dbo.sysprocesses might show waits on PARALLEL_PAGE_SUPPLIER.

  • Very Heavy  insert/update/deletes on Heap/BLOB data structures on the SQL instance

When there is heavy inserts/updates/deletes on heap/BLOB data structures, the internal memory structures which is used for allocation/deallocation of the pages to heap are synchronized using Non-BUF Page Latch.In one of my customer site, they had an audit table which was a heap table which was inserted heavily since every transaction in the application is logged in the audit table which causes excessive waits on LATCH_EX waits.

You can detect this case when you see  ALLOC_FREESPACE_CACHE, ALLOC_EXTENT_CACHE, ALLOC_CREATE_FREESPACE_CACHE as one of the major waittype in sys.dm_os_latch_stats

While waitresource column in master.dbo.sysprocesses might show waits on IDES

  • Excessive  Auto-grow or Autoshrink Operation

Due to insufficient sizing or capacity planning, you might experience frequent Autogrow or if you Autoshrink turned ON, you might see frequent AUTOSHRINK. During growing or shrinking, SQL Server acquires FCB, FGCB_ALLOC latch to synchronize the access to the filegroup.You can detect this case when you see FCB, FGCB_ALLOC as one of the major waittype in sys.dm_os_latch_stats.

In this case, the waitresource columns in master.dbo.sysprocesses might show waits on FCB and FGCB_ALLOC

Please Note: The above information is purely based on my research and experience. The content should be treated AS IS and under no circumstance should be treated as offical documentation from Microsoft

Parikshit Savjani
Premier Field Engineer

2 comments

  1. Thanks a lot! In my case I was seeing high wait times on types: PREEMPTIVE_OS_FLUSHFILEBUFFERS and LATCH_SH which was caused by a database file autogrow of 1MB :). After changing the autogrow to 1024MB the issues did not reoccur.

Leave a Reply to Maneesh Cancel reply

Your email address will not be published. Required fields are marked *