Extended Events and LOCK_HASH Spinlock: Lessons from the Field

Today, I want to share an intriguing case that might change how you use extended event sessions. It involves a LOCK_HASH spinlock issue that caused significant performance problems on a busy SQL Server. This case will show you the hidden costs of collecting statement-level events.

Issue Overview

The issue arose on:

  • Microsoft SQL Server 2022 (RTM-CU8) (KB5029666) – 16.0.4075.1 (X64)
  • 2 sockets, 32 cores (VMware)
  • 80 GB RAM
  • Average transactions per second: 6500
  • Average CPU utilization: 22%

My attention was drawn to this problem when I noticed an immediate 5-8% increase in average CPU utilization, which caused a noticeable slowdown in performance. The graph below illustrates the impact on CPU usage as monitored in the existing setup:

Investigation Steps

To identify the root cause, we checked several potential factors:

  1. Code Release: Verified if any new code was released recently.
  2. CPU-Intensive Queries: Investigated if specific queries were consuming more CPU.
  3. Plan Changes: Checked for any plan changes in stored procedures.
  4. Data Changes: Looked for significant changes in data, such as bulk loads or mass updates.
  5. System Sessions: Analyzed utilization from system sessions, including ghost cleanup processes.

Despite these checks, the overall user workload continued to perform slower than expected.

Investigating Spinlock Contention

The next step was to check for spinlock contention, and we encountered a significant surprise:


Understanding LOCK_HASH Spinlock

According to the official documentation, LOCK_HASH: Protects access to the lock manager hash table, which stores information about the locks being held in a database. For more details, refer to KB2926217 and the Transaction Locking and Row Versioning Guide.

From SQL Server Spinlocks – sqlshack.com:

Spinlock contention, particularly related to the LOCK_HASH spinlock type, can cause performance issues. Large values of LOCK_HASH spinlocks, combined with high numbers of spins, collisions, and backoffs, suggest potential spinlock contention. The common causes and mitigation strategies include:

  • Using the shortest possible transactions
  • Ensuring the application does not use multiple sessions for updating the same data
  • Turning off page-level locking
  • Closely tracking lock escalation to reduce contention on the hash bucket
  • Potentially using the NOLOCK hint in queries when multiple threads try to read the same data simultaneously

Case Analysis

In my situation, none of the documented potential root causes were identified. The issue coincided with the start of an extended event session. By simplifying the session in a few steps, I isolated the field causing the spinlock. When the ‘sp_statement_completed’ field’s ‘statement’ was added, the issue could be reproduced consistently. Code example below:

CREATE EVENT SESSION [hash_spinlock_test] ON SERVER 
ADD EVENT sqlserver.sp_statement_completed(SET collect_object_name=(1),collect_statement=(1)
    ACTION(sqlserver.client_hostname,sqlserver.database_id,sqlserver.session_id,sqlserver.username)
    WHERE ([duration]>=(100000)))
ADD TARGET package0.event_file(SET filename=N'hash_spinlock_test.xel',max_file_size=(100))

There is a filter configured for duration, so we will capture only events with a duration of 1 second or more, as there are no statements that execute this long on our system. However, right after the session starts, we see the LOCK_HASH spinlock usage skyrocket while recording zero events in the configured session. The worst part is that stopping the extended event session does not reduce the spinlock contention. In my case, the only solution was to restart the instance.

You might think I’m complaining about nothing and that this extra CPU utilization doesn’t impact workload. To illustrate the impact, here’s a comparison of average execution times for our top stored procedures:

Stored ProcedureAvg Time (ms) Without LOCK_HASH SpinlockAvg Time (ms) With LOCK_HASH Spinlock IssueDifference (%)
SP one294141%
SP two49502%
SP three385134%

As shown, our top stored procedure became 41% slower.

Diagnosing the Issue

The easiest way to confirm if you have the same issue is to monitor the “Lock Requests/sec” performance counter. In my case, it increased from 170k to over 700k and didn’t normalize until a restart. This can be a quick diagnostic tool. If you notice similar symptoms, the spinlock monitoring techniques described in my previous post could help confirm the issue.

Conclusion

It’s advisable to avoid collecting the statement field in the sp_statement_completed event on a busy server, as it can lead to significant performance degradation. Even after stopping the extended event session, the issue with the LOCK_HASH spinlock persists. Currently, the only workaround I know is to restart the SQL Server instance.

Open Questions

  1. Bug or Design? Is this behavior a bug or by design? Collecting statement-level information is understandably performance-intensive, but it raises the question: why do we need to call all these functions to get the statement if, in the end, the extended event session records zero events? It seems logical that we should collect each statement if we are using it as a filter.
  2. Post-Session Issue Persistence: Why does the LOCK_HASH spinlock contention remain high and the server remain slow even after stopping the extended event session?

These questions highlight the need for further investigation to fully understand and mitigate the impact of collecting statement-level events on SQL Server performance.

Call Stack Examples

I have collected the call stack for a short period, and the top three entries are shown below:

top 1

00 SqlDK!GenericEvent::PublishAndCallAction
01 SqlDK!XeSosPkg::spinlock_backoff::Publish
02 SqlDK!SpinlockBase::Sleep
03 SqlDK!SpinlockBase::Backoff
04 sqlmin!Spinlock<187,7,257>::SpinToAcquireWithExponentialBackoff
05 sqlmin!lck_unlockInternal
06 sqlmin!BlobManager::UnLockRoot
07 sqlmin!BlobManager::Cleanup
08 sqlmin!LockBytesSS::GoDormant
09 sqlmin!LockBytesSS::Release
0a sqlmin!CMEDScanLob::`vector deleting destructor'
0b sqlmin!CMEDAccess::~CMEDAccess
0c sqlmin!CMEDAccess::`vector deleting destructor'
0d sqllang!CSQLObject::GetCharsFromModule
0e sqllang!CSQLObject::CbGetChars
0f sqllang!TraceUtil::GetStatementInfoBase<1>
10 sqllang!TraceUtil::GetStatementInfo
11 sqllang!CMsqlExecContext::ExecuteStmts<1,0>
12 sqllang!CMsqlExecContext::FExecute
13 sqllang!CSQLSource::Execute
14 sqllang!CStmtExecProc::XretLocalExec
15 sqllang!CStmtExecProc::XretExecExecute
16 sqllang!CXStmtExecProc::XretExecute
17 sqllang!CExecStmtLoopVars::ExecuteXStmtAndSetXretReturn
18 sqllang!CMsqlExecContext::ExecuteStmts<1,0>
19 sqllang!CMsqlExecContext::FExecute
1a sqllang!CSQLSource::Execute
1b sqllang!CStmtExecProc::XretLocalExec
1c sqllang!CStmtExecProc::XretExecExecute
1d sqllang!CXStmtExecProc::XretExecute
1e sqllang!CExecStmtLoopVars::ExecuteXStmtAndSetXretReturn
1f sqllang!CMsqlExecContext::ExecuteStmts<1,0>
20 sqllang!CMsqlExecContext::FExecute
21 sqllang!CSQLSource::Execute

top 2

00 SqlDK!GenericEvent::PublishAndCallAction
01 SqlDK!XeSosPkg::spinlock_backoff::Publish
02 SqlDK!SpinlockBase::Sleep
03 SqlDK!SpinlockBase::Backoff
04 sqlmin!Spinlock<187,7,257>::SpinToAcquireWithExponentialBackoff
05 sqlmin!lck_lockInternal
06 sqlmin!BlobBase::GetLock
07 sqlmin!BlobBase::LockRID
08 sqlmin!BlobManager::SetupInRowBlobRoot
09 sqlmin!BlobManager::OpenInternal
0a sqlmin!BlobManager::Open
0b sqlmin!LockBytesSS::Open
0c sqlmin!OpenLockBytesNew
0d sqlmin!GetDataAsILockBytes<0>
0e sqlmin!IndexDataSetSession::GetDataLong<0>
0f sqlmin!IndexDataSetSession::GetDataInternal
10 sqlmin!RowsetNewSS::GetData
11 sqlmin!CMEDScan::GetLOB
12 sqlmin!CMEDCatYukonObject::GetSQLExprLob
13 sqlmin!CMEDProxySQLExpression::GetSQLExprLobSrc
14 sqllang!CSQLObject::GetCharsFromModule
15 sqllang!CSQLObject::CbGetChars
16 sqllang!TraceUtil::GetStatementInfoBase<1>
17 sqllang!TraceUtil::GetStatementInfo
18 sqllang!CMsqlExecContext::ExecuteStmts<1,0>
19 sqllang!CMsqlExecContext::FExecute
1a sqllang!CSQLSource::Execute
1b sqllang!CStmtExecProc::XretLocalExec
1c sqllang!CStmtExecProc::XretExecExecute
1d sqllang!CXStmtExecProc::XretExecute
1e sqllang!CExecStmtLoopVars::ExecuteXStmtAndSetXretReturn
1f sqllang!CMsqlExecContext::ExecuteStmts<1,0>
20 sqllang!CMsqlExecContext::FExecute
21 sqllang!CSQLSource::Execute

top 3

00 SqlDK!GenericEvent::PublishAndCallAction
01 SqlDK!XeSosPkg::spinlock_backoff::Publish
02 SqlDK!SpinlockBase::Sleep
03 SqlDK!SpinlockBase::Backoff
04 sqlmin!Spinlock<187,7,257>::SpinToAcquireWithExponentialBackoff
05 sqlmin!lck_lockInternal
06 sqlmin!GetDataLock
07 sqlmin!BTreeRow::AcquireLock
08 sqlmin!IndexDataSetSession::GetRowByKeyValue
09 sqlmin!IndexDataSetSession::FetchRowByKeyValueInternal
0a sqlmin!RowsetNewSS::FetchRowByKeyValueInternal
0b sqlmin!RowsetNewSS::FetchRowByKeyValue
0c sqlmin!CMEDScan::StartSearch
0d sqlmin!CMEDCatYukonObject::GetSQLExprLob
0e sqlmin!CMEDProxySQLExpression::GetSQLExprLobSrc
0f sqllang!CSQLObject::GetCharsFromModule
10 sqllang!CSQLObject::CbGetChars
11 sqllang!TraceUtil::GetStatementInfoBase<1>
12 sqllang!TraceUtil::GetStatementInfo
13 sqllang!CMsqlExecContext::ExecuteStmts<1,0>
14 sqllang!CMsqlExecContext::FExecute
15 sqllang!CSQLSource::Execute
16 sqllang!CStmtExecProc::XretLocalExec
17 sqllang!CStmtExecProc::XretExecExecute
18 sqllang!CXStmtExecProc::XretExecute
19 sqllang!CExecStmtLoopVars::ExecuteXStmtAndSetXretReturn
1a sqllang!CMsqlExecContext::ExecuteStmts<1,0>
1b sqllang!CMsqlExecContext::FExecute
1c sqllang!CSQLSource::Execute
1d sqllang!CStmtExecProc::XretLocalExec
1e sqllang!CStmtExecProc::XretExecExecute
1f sqllang!CXStmtExecProc::XretExecute
20 sqllang!CExecStmtLoopVars::ExecuteXStmtAndSetXretReturn
21 sqllang!CMsqlExecContext::ExecuteStmts<1,0>

The Hidden Challenges of Trace Flag 834 for SQL Server


Today, we’re diving into a crucial feature of Microsoft SQL Server, Trace Flag (TF) 834, which could potentially speed up your database. As modern servers boast large amounts of memory, the idea of utilizing larger memory pages through this trace flag seems appealing. But should you really use it? Let’s explore.

What Does “Large Page” Mean?

A “Large Page” refers to using bigger blocks of memory all at once. This method is faster for the computer when it needs to allocate and use memory. Normally, Windows manages memory in 4Kb chunks on x64 systems, but with large pages, these chunks grow to at least 2Mb or even up to 16Mb. For a deeper dive, “Microsoft Windows Internals” by Mark Russinovich is a great resource.

The Benefits of Using Large Pages

One key advantage, as highlighted in the Temenos T24 – Trace Flag 834 and When to Use It – Microsoft Community Hub by the SQL Server Team, is that large pages can boost performance. They do this by making the translation look-aside buffer (TLB) in the CPU more efficient.  In other words, TF 834 increases the efficiency of managing physical to virtual memory address translation that is performed by memory management hardware. If you’re looking to enable this feature, detailed instructions are available in the mentioned article.

The Downsides of Large Pages

Despite the seeming lack of drawbacks to using Trace Flag 834, caution is advised. Official guidance suggests avoiding it with columnstore indexes. Moreover, I’ve encountered scenarios where enabling large pages significantly increased kernel CPU usage, causing SQL Server to slow down dramatically. A server restart was the only workaround.

Real-Life Example

In one instance, kernel CPU usage skyrocketed from 2-3% to 30-35% upon enabling large pages after few hours of uptime, severely impacting SQL Server performance. The issue seems to extend beyond SQL Server itself, hinting at a deeper interaction with the OS kernel mode.

Part of the callstack to confirm that you have faced the very same issue:

...
|sqldk.dll!MemoryClerkInternal::FreePage
|sqldk.dll!MemoryClerkInternal::FreePageInline
|sqldk.dll!MemoryNode::FreePageInternal
|sqldk.dll!SOS_MemoryWorkSpace::FreePage
||-KernelBase.dll!VirtualFree
||-sqldk.dll!SOS_MemoryWorkSpace::DeCommitBlock

Windows Performance Toolkit | Microsoft Learn can be used to capture a callstack.

The issue was solved by removing TF 834 from SQL Server startup parameters.

Conclusion

I’ve run into this problem with SQL Server 2019 and 2022, and it might be something others face too, no matter the version they’re using. This issue might look different on different servers, like those running Linux, because of how they manage memory. So, it’s really important to be careful when using Trace Flag 834 and to test it thoroughly.

Got any questions or want to share your own experience? Please leave a comment and let’s talk about it.

Unlocking Spinlocks: Long-Term Monitoring in SQL Server

In this post

In this post, I’m going to talk about spinlocks in SQL Server, starting with an easy-to-understand explanation of what spinlocks are. I’ll also provide practical advice on how to monitor and capture spinlock information over the long term, ensuring your SQL Server environment remains efficient and responsive.

What is spinlock.

Spinlocks are simple locks used in SQL Server to manage access to data when multiple processes try to use it simultaneously. Think of them as quick checks: if a process wants to use a resource that’s already in use, it “spins” in place, checking repeatedly and quickly until the resource becomes available, using CPU cycles during this process. This ensures quick access control but can lead to high CPU usage if many processes are spinning frequently. This method is efficient for short operations to avoid the overhead of more complex locking mechanisms, ensuring that the system can handle many operations smoothly without significant delays.

SQL Server uses spinlocks to protect various low-level, critical data structures from concurrent access. This includes structures like caches (e.g., buffer pool), internal lists, and other metadata that require fast, efficient access control to ensure the database’s integrity and performance during high levels of concurrency.

Why do I need to monitor spinlocks in long term.

Monitoring spinlocks in the long term is essential for maintaining SQL Server performance. High contention on spinlocks can lead to significant CPU usage, impacting overall system responsiveness. It’s important to note that starting spinlock monitoring can be almost impossible when you’re already facing an issue with 100% CPU utilization. By proactively monitoring spinlock behavior, you can identify and address potential problems before they escalate, ensuring that your database environment remains efficient, responsive, and capable of handling high concurrency levels without degradation in performance.

How to setup long term spinlock monitoring.


To effectively monitor spinlocks, we’ll set up two tables: one for storing raw data from sys.dm_os_spinlock_stats and another for holding the calculated differences between two snapshots. Additionally, two stored procedures will be introduced—one for capturing the current state and another for calculating the differences between snapshots. This approach allows for a comprehensive analysis of spinlock activity over time. Please check the code below:

--Create tables
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[spinlock_monitor]') AND type in (N'U'))
BEGIN
CREATE TABLE [dbo].[spinlock_monitor](
	[runtime] [datetime] NOT NULL,
	[name] [nvarchar](256) NOT NULL,
	[collisions] [bigint] NULL,
	[spins] [bigint] NULL,
	[spins_per_collision] [real] NULL,
	[sleep_time] [bigint] NULL,
	[backoffs] [bigint] NULL
) ON [PRIMARY]
WITH
(
DATA_COMPRESSION = PAGE
)
END
GO

IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[spinlock_monitor]') AND name = N'Ci')
CREATE CLUSTERED INDEX [Ci] ON [dbo].[spinlock_monitor]
(
	[runtime] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = PAGE) ON [PRIMARY]
GO

IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[spinlock_monitor_timeseries]') AND type IN (N'U'))
BEGIN
CREATE TABLE [dbo].[spinlock_monitor_timeseries](
	[runtime] [DATETIME] NOT NULL,
	[name] [NVARCHAR](256) NOT NULL,
	[collisions] [BIGINT] NULL,
	[spins] [BIGINT] NULL,
	[spins_per_collision] [REAL] NULL,
	[sleep_time] [BIGINT] NULL,
	[backoffs] [BIGINT] NULL
) ON [PRIMARY]
WITH
(
DATA_COMPRESSION = PAGE
)
END
GO


IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[spinlock_monitor_timeseries]') AND name = N'CiSpinlock_monitor_timeseries')
CREATE CLUSTERED INDEX [CiSpinlock_monitor_timeseries] ON [dbo].[spinlock_monitor_timeseries]
(
	[runtime] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = PAGE) ON [PRIMARY]
GO

IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[spinlock_monitor_timeseries]') AND name = N'Unique_TimeAndName')
CREATE UNIQUE NONCLUSTERED INDEX [Unique_TimeAndName] ON [dbo].[spinlock_monitor_timeseries]
(
	[runtime] ASC,
	[name] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = PAGE) ON [PRIMARY]
GO

--collect_spinlock_monitor

IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[collect_spinlock_monitor]') AND type in (N'P', N'PC'))
DROP PROCEDURE [dbo].[collect_spinlock_monitor]

GO

CREATE PROCEDURE [dbo].[collect_spinlock_monitor]
AS
BEGIN
SET NOCOUNT ON
SET LANGUAGE english;
INSERT INTO spinlock_monitor
SELECT GETDATE() [runtime], *
FROM
sys.dm_os_spinlock_stats WHERE collisions <>0
ORDER BY spins DESC

DELETE spinlock_monitor WHERE [runtime] < DATEADD(dd, 0, DATEDIFF(dd, 0, GETDATE()-14))

END
GO

--calculation_spinlock_monitor
IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[calculation_spinlock_monitor]') AND type in (N'P', N'PC'))
DROP PROCEDURE [dbo].[calculation_spinlock_monitor]
go

CREATE PROCEDURE [dbo].[calculation_spinlock_monitor] @StartTime DATETIME = NULL
	,@EndTime DATETIME = NULL
AS
BEGIN

	SET NOCOUNT ON;
	SET LANGUAGE english;

	DECLARE @RowCount INT;

-- Count the number of rows returned by the query
SELECT @RowCount = COUNT(*)
FROM (
    SELECT DISTINCT TOP 2 runtime
    FROM [dbo].[spinlock_monitor]
    ORDER BY runtime ASC
) AS Subquery;

-- Check the row count and run code accordingly
IF @RowCount = 2
BEGIN

	IF @StartTime IS NULL
		AND @EndTime IS NULL
	BEGIN
		SELECT @StartTime = MAX(runtime)
			,@EndTime = MIN(runtime)
		FROM (
			SELECT DISTINCT TOP 2 runtime
			FROM [dbo].[spinlock_monitor]
			ORDER BY runtime ASC
			) AS LatestDates;
	END;

	-- Declare temporary tables for initial and subsequent snapshots
	DECLARE @InitialSnapshot TABLE (
		[runtime] DATETIME
		,[name] VARCHAR(255)
		,[collisions] BIGINT
		,[spins] BIGINT
		,[spins_per_collision] REAL
		,[sleep_time] BIGINT
		,[backoffs] BIGINT
		);
	DECLARE @SubsequentSnapshot TABLE (
		[runtime] DATETIME
		,[name] VARCHAR(255)
		,[collisions] BIGINT
		,[spins] BIGINT
		,[spins_per_collision] REAL
		,[sleep_time] BIGINT
		,[backoffs] BIGINT
		);
	DECLARE @ExactStartTime DATETIME;
	DECLARE @ExactEndTime DATETIME;

	-- Populate initial and subsequent snapshots
	SELECT TOP 1 @ExactStartTime = [runtime]
	FROM [dbo].[spinlock_monitor]
	WHERE [runtime] >= @StartTime
	ORDER BY [runtime] ASC;

	INSERT INTO @InitialSnapshot
	SELECT *
	FROM [dbo].[spinlock_monitor]
	WHERE [runtime] = @ExactStartTime

	SELECT TOP 1 @ExactEndTime = [runtime]
	FROM [dbo].[spinlock_monitor]
	WHERE [runtime] <= @EndTime
	ORDER BY [runtime] DESC;

	INSERT INTO @SubsequentSnapshot
	SELECT *
	FROM [dbo].[spinlock_monitor]
	WHERE [runtime] = @ExactEndTime

	INSERT INTO spinlock_monitor_timeseries ([runtime], [name], [collisions], [spins], [spins_per_collision], [sleep_time], [backoffs])
	SELECT [ts2].runtime AS runtime
		,[ts2].[name] AS [Spinlock]
		,[ts2].[collisions] AS [DiffCollisions]
		,[ts2].[spins] AS [DiffSpins]
		,[ts2].[spins_per_collision] AS [SpinsPerCollision]
		,[ts2].[sleep_time] AS [DiffSleepTime]
		,[ts2].[backoffs] AS [DiffBackoffs]
	FROM @InitialSnapshot [ts2]
	LEFT OUTER JOIN @SubsequentSnapshot [ts1] ON [ts2].[name] = [ts1].[name]
	WHERE [ts1].[name] IS NULL
	UNION
	SELECT [ts2].runtime AS runtime
		,[ts2].[name] AS [Spinlock]
		,[ts2].[collisions] - [ts1].[collisions] AS [DiffCollisions]
		,[ts2].[spins] - [ts1].[spins] AS [DiffSpins]
		,CASE ([ts2].[spins] - [ts1].[spins])
			WHEN 0
				THEN 0
			ELSE ([ts2].[spins] - [ts1].[spins]) / ([ts2].[collisions] - [ts1].[collisions])
			END AS [SpinsPerCollision]
		,[ts2].[sleep_time] - [ts1].[sleep_time] AS [DiffSleepTime]
		,[ts2].[backoffs] - [ts1].[backoffs] AS [DiffBackoffs]
	FROM @InitialSnapshot [ts2]
	LEFT OUTER JOIN @SubsequentSnapshot [ts1] ON [ts2].[name] = [ts1].[name]
	WHERE [ts1].[name] IS NOT NULL
		AND [ts2].[collisions] - [ts1].[collisions] > 0
	ORDER BY [DiffSpins] DESC;
--delete processed data
	DELETE [spinlock_monitor] WHERE [runtime] = @ExactEndTime
END
ELSE
PRINT 'No data in spinlock_monitor';
END


Once the necessary objects are in place, we can proceed to set up a long-term monitoring for spinlocks. We will utilize SQL Server Agent to automate the data capture process, ensuring consistent and timely collection of spinlock activity for analysis.

Code example for SQL Server agent job:

--exec in user database context where objects created
WHILE 1 = 1
BEGIN
	SET @CurrentTime = GETDATE();
	IF @CurrentTime  >= '06:59:00' AND @CurrentTime < '07:00:00'
	BEGIN
		BREAK;-- Exit the loop
	END
	EXEC [dbo].[collect_spinlock_monitor]
	WAITFOR DELAY '00:00:15'

END


The provided code uses a while loop to continuously capture spinlock data at a set interval of 15 seconds, with a specific condition to exit the loop if the current time is within a defined range. This design allows the data capture process at a predetermined time. Following this interruption, a new cycle of the loop can be scheduled to begin via SQL Server Agent, ensuring continuous monitoring without overlapping executions. This approach is preferred in scenarios of high CPU utilization, as it ensures data capture can continue without the need to establish new connections, which may be difficult or impossible due to system resource constraints. This method allows for consistent monitoring even under significant load, providing valuable insights into system performance and spinlock activity.

Now that we’re capturing spinlock snapshots every 15 seconds, the next step is to analyze the data. Calculating the differences between snapshots outside of peak hours helps us track performance trends over time. This approach allows for less strain on the server during high-traffic periods and provides a clearer view of the data changes when the system is not under heavy load. For the actual calculation, you can use a scheduled job or a manual process, depending on your preference and system requirements.

SET NOCOUNT ON;
SET LANGUAGE english;

EXEC calculation_spinlock_monitor;

Ready to dive into the data you’ve collected? Just run this simple query:

select * from spinlock_monitor_timeseries

This command fetches all the data from the spinlock_monitor_timeseries table. To bring this data to life, why not try a visualization tool? It can make spotting trends and patterns much more intuitive. If you’re looking for recommendations or have any questions, drop a comment below—I might just explore that in a future post!

For those hungry for more details on spinlocks, I recommend reading the comprehensive whitepaper available at Microsoft Learn: Diagnose & Resolve Spinlock Contention in SQL Server. It’s a great resource for deepening your understanding.

Slow TRUNCATE: Diagnosing SQL Server Spinlock Contention

Introduction

In the world of managing databases, sometimes we face surprising problems with systems like Microsoft SQL Server. This can be tough for both developers and administrators. Recently, we’ve seen a good example of this with an issue related to spinlocks in SQL Server. It shows us just how complex and detailed these database systems can be.

There’s a strange problem affecting systems using 1C:Enterprise, and it’s all about how TRUNCATE operations work. Usually, TRUNCATE is quick and efficient for clearing big tables, but in some cases, it’s not working as expected and is causing issues.

To start understanding this problem, there’s a really interesting article on Habr that looks into it. It’s one of the first times this issue has been publicly explored: Странное поведение MS SQL Server 2019: длительные операции TRUNCATE(which means ‘Strange Behavior of MS SQL Server 2019: Long TRUNCATE Operations’). It’s written in Russian, but don’t worry, I’ll give a summary of the key points in English right here.

Problem Description

The core of the issue lies in the unexpected and prolonged duration of TRUNCATE commands in SQL Server 2019 and newer (admittedly, I have not verified whether this issue also exists in older versions). Typically, a TRUNCATE operation is near-instantaneous. However, in the cases observed, these commands were taking significantly longer – ranging from 5 to 20 seconds, and sometimes even more.

Observations and Experiments

In summarizing the insightful findings from an external article, we’ve highlighted several key observations:

  • High CPU Consumption: A startling aspect of this issue was the substantial CPU resources consumed by the TRUNCATE operations.
  • Specific to SQL Server 2019: This peculiar behavior seems to be exclusive to SQL Server 2019. Prior versions of SQL Server did not exhibit this anomaly, pointing to a version-specific characteristic or change in the newer release.
  • Temporary Resolution: Temporarily, restarting the SQL Server seemed to mitigate the problem, but this was not a permanent fix. The issue reemerged within a few days.
  • Processor Time Correlation: the duration of the TRUNCATE commands appeared to be directly correlated with processor time.
  • Independence from Row Count: number of rows in the temporary table did not seem to influence the duration of the TRUNCATE command, suggesting that the issue was not related to the volume of data being processed.
  • Reproduction Difficulties: The article also documented challenges in replicating the issue under test conditions, with variations in intensity and row counts in temporary tables failing to consistently recreate the problem, complicating the diagnosis.

Having outlined the significant points from the external article’s examination, we will now pivot to my investigation, where we’ll explore the root cause and present a SQL Server-side workaround that I’ve identified to effectively tackle this issue.

Troubleshooting

Confirm the existence of the problem.

When faced with an anomalous performance issue in SQL Server, especially one as elusive as the one in question, it’s crucial to adopt a systematic approach to troubleshooting. The first step in our troubleshooting process is to confirm the existence of the problem. To do this, we use SQL Server’s Extended Events, a highly configurable system that allows us to monitor and collect data about SQL Server’s operation.

Extended Event session code:

CREATE EVENT SESSION [truncate] ON SERVER 
ADD EVENT sqlserver.sql_statement_completed(SET collect_statement=(1)
    ACTION(package0.callstack,sqlos.cpu_id,sqlos.scheduler_id,sqlos.task_time,sqlserver.session_id,sqlserver.username)
    WHERE ([sqlserver].[like_i_sql_unicode_string]([statement],N'TRUNCATE TABLE%') AND [cpu_time]>=(100000)))
ADD TARGET package0.event_file(SET filename=N'truncate.xel',max_rollover_files=(0))
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)
GO

We have the flexibility to run this session for a few minutes or even a few hours. However, it’s important to note that collecting data on ‘sql_statement_completed’ events might impact performance, particularly when the collection extends over a long period and on systems with high activity.

Results below:

From the collected data, we observed multiple occurrences of the TRUNCATE TABLE statement being executed against a temporary table.

High CPU Time: The cpu_time column, representing the amount of CPU time used by the process, shows exceedingly high values. In some instances, the CPU time reaches nearly 4 seconds (e.g., 3859000 microseconds), which is an order of magnitude higher than what one would expect from a TRUNCATE operation, typically a metadata-only operation that does not physically process each row.

Spinlock or Investigate high CPU usage.

The next step in troubleshooting the TRUNCATE TABLE performance issue is to investigate whether spinlocks might be contributing to the high CPU usage observed. Spinlocks are low-level synchronization primitives used by SQL Server to manage access to its internal structures, and contention on these spinlocks can lead to increased CPU utilization.

Below is an Extended Event session to capture spinlock_backoff events, which occur when a thread decides to back off after failing to acquire a spinlock. By collecting this data, we can determine if spinlock contention is occurring during the execution of TRUNCATE TABLE statements and if it’s a factor in the high CPU usage.

CREATE EVENT SESSION [spin_lock_backoff] ON SERVER 
ADD EVENT sqlos.spinlock_backoff(
    ACTION(package0.callstack,sqlserver.sql_text)
    WHERE ([sqlserver].[like_i_sql_unicode_string]([sqlserver].[sql_text],N'truncate table%')))
ADD TARGET package0.event_file(SET filename=N'spin_lock_backoff.xel',max_rollover_files=(0)),
ADD TARGET package0.histogram(SET filtering_event_name=N'sqlos.spinlock_backoff',source=N'package0.callstack',source_type=(1))
WITH (MAX_MEMORY=51200 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=PER_NODE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)
GO

Results below:


The results from the histogram target of your Extended Events session for spinlock backoffs provide some telling data. The screenshot shows a list of values, each representing a unique call stack in the SQL Server process, along with a count of how many times a backoff occurred for each call stack.

The histogram data collected from our Extended Events session has highlighted a notable frequency of spinlock backoffs associated with a specific call stack. To precisely identify the spinlock responsible for this contention, we can correlate the call stack with its corresponding spinlock name. In this case, by examining the collected event file — as opposed to the histogram — we have identified the contentious spinlock as ‘OPT_IDX_STATS’.

Decoding the Call Stack

The call stack obtained from the Extended Events session has been resolved with SQLCallstackResolver to reveal the sequence of operations within SQL Server leading up to the spinlock contention. The stack trace includes several functions, culminating in the CQSIndexStatsMgr::DropIndexStatsWithWildcards method.

Resolved Call Stack:

sqllang!XeSqlPkg::CollectSqlText<XE_ActionForwarder>
sqllang!XeSqlPkg::CollectSqlTextActionInvoke
sqldk!XeSosPkg::spinlock_backoff::Publish
sqldk!SpinlockBase::Sleep
sqldk!SpinlockBase::Backoff
sqllang!Spinlock<287,1,257>::SpinToAcquireWithExponentialBackoff
sqllang!CQSIndexStatsMgr::PqsisAdvanceInBucket
sqllang!CQSIndexStatsMgr::DropIndexStatsWithWildcards
sqllang!CQSIndexRecMgr::InvalidateMissingIndexes
sqllang!QPEvent::ModifyObject
sqllang!CStmtTruncateTable::XretExecute
sqllang!CMsqlExecContext::ExecuteStmts<1,0>
sqllang!CMsqlExecContext::FExecute
sqllang!CSQLSource::Execute
sqllang!process_request
sqllang!process_commands_internal
sqllang!process_messages
sqldk!SOS_Task::Param::Execute
sqldk!SOS_Scheduler::RunTask
sqldk!SOS_Scheduler::ProcessTasks
sqldk!SchedulerManager::WorkerEntryPoint
sqldk!SystemThreadDispatcher::ProcessWorker
sqldk!SchedulerManager::ThreadEntryPoint
KERNEL32+0x17974

From the call stack, it appears that after a TRUNCATE TABLE operation, SQL Server proceeds to invalidate cache entries for missing index suggestions. This process requires access to the OPT_IDX_STATS spinlock to safely update the cache. If multiple TRUNCATE TABLE statements are executed concurrently, or in quick succession, it can lead to contention for this spinlock as each transaction tries to update the same cache concurrently.

Resolution

The workaround for this issue is to enable the Trace Flag (TF) 2392 which will suppress the collection of missing indexes.

This trace flag was mentioned in KB4042232 – FIX: Access violation when you cancel a pending query if the missing indexes feature is enabled in SQL Server – Microsoft Support

After applying the trace flag, we resumed monitoring with the ‘truncate’ Extended Event session. Significantly, we observed no instances of TRUNCATE events exceeding 100ms in CPU time. Furthermore, as a beneficial side effect on the production system, we noted a decrease in CPU utilization during business hours, dropping from an average of ~40% to about 30%.

My Perspective

The behavior observed in SQL Server appears to be ‘by design.’ However, it’s perplexing why there’s a need to invoke ‘InvalidateMissingIndexes’ when truncating a temporary table. Considering potential solutions, the Microsoft product group might explore the following options:

  1. Re-evaluating Necessity: Determine the actual need for invalidating missing indexes during the truncation of temporary tables. If found unnecessary, this step could be bypassed to avoid the issue.
  2. Trace Flag 2392 Utilization: This flag currently suppresses the collection of missing indexes for the entire instance. A more targeted approach could be to enable its application on a specific database level. With many new features undergoing testing in Azure, this could be a feasible solution to explore.
  3. Spinlock Partitioning: If avoiding the call to ‘InvalidateMissingIndexes’ isn’t feasible and controlling it at the database level is not an option, another possibility could be to partition the OPT_IDX_STATS spinlock. This might alleviate contention and improve performance.

These suggestions aim to address the root of the problem while maintaining the integrity and functionality of SQL Server’s operations.