For the past few months I have been chasing down an issue which has been plaguing my attempts at tempdb monitoring, and in just the last week I’ve received news that it is indeed a confirmed bug in SQL Server. I’m sharing what I’ve found here for the benefit of the SQL community.
One of the features I worked on for SQL diagnostic manager was our tempdb monitoring feature, whereby users can track and alert on a number of tempdb-specific metrics. Among the items collected are the tempdb page allocation and deallocation values from sys.dm_db_session_space_usage and sys.dm_db_task_space_usage – specifically, the following columns:
The details of how to use these values in troubleshooting are covered in this Troubleshooting Insufficient Disk Space in tempdb article by Microsoft. In brief, these values are intended to track the number of pages allocated or deallocated by a task or session, which can allow the practitioner to identify sessions which are using excessive amounts of space in tempdb or which are churning through a large number of pages over their lifetime.
We were periodically seeing cases where the apparent space used by a session – as calculated by subtracting the deallocation count from the allocation count for a session – was well in excess of the total size of tempdb. I worked closely with Robert Davis (b|t) to identify whether this was a bug in the product or a misunderstanding of the contents of the DMV, but he confirmed our logic and believed it was probably a bug. In deeper testing I was able to reproduce the issue very consistently using simple batches. I’d like to walk you through the process.
In this example, a temp table is being created and then rows are being added. I’ve noted the line where a “go 10” is used – this causes the script to fill the temp table with 10 rows before dropping it. The script waits 10 seconds, then checks the allocation and deallocation counts from sys.dm_db_session_space_usage and sys.dm_db_task_space_usage.
set nocount on create table #BigTempTable(a nchar(4000) default 'a') go insert into #BigTempTable default values -- Note this value go 10 drop table #BigTempTable go waitfor delay '00:00:10' select 'Session Space', ssu.session_id, sessionUserPagesAlloc = sum(ssu.user_objects_alloc_page_count), sessionUserPagesDealloc = sum(ssu.user_objects_dealloc_page_count), taskUserPagesAlloc = sum(tsu.user_objects_alloc_page_count), taskUserPagesDealloc = sum(tsu.user_objects_dealloc_page_count), sessionInternalPagesAlloc = sum(ssu.internal_objects_alloc_page_count), sessionInternalPagesDealloc = sum(ssu.internal_objects_dealloc_page_count), taskInternalPagesAlloc = sum(tsu.internal_objects_alloc_page_count), taskInternalPagesDealloc = sum(tsu.internal_objects_dealloc_page_count) from tempdb.sys.dm_db_session_space_usage ssu left join tempdb.sys.dm_db_task_space_usage tsu on ssu.session_id = tsu.session_id where ssu.session_id = @@spid group by ssu.session_id
When run as above, with go 10, the DMV data returns exactly what is expected (I have switched rows and columns below for easy readability).
Each of the 10 loops adds 1 row to the temp table, and at that row size each row is 1 page. The first 8 pages come from mixed extents, and pages 9 and 10 come from a second extent, for a total of 16 pages allocated to the table. Upon dropping the table, all of that space is deallocated, so we see both an allocation and a deallocation of 16 pages.
If I reset the connection so I have a fresh spid and change the “go 10” to “go 5000,” however, the returned data begins looking a bit suspect.
At 5000 rows, this data divides up nicely into 625 extents, so exactly 5000 pages are allocated. However, no pages are marked as deallocated even though the object was dropped.
If I remove the “where ssu.session_id = @@spid” clause and look at the space utilization of all sessions, I find that the deallocation value is apparently being credited to a system spid – in this case, spid 26:
So this is the cause of the incorrect values for session space used. These two contrived examples show a small and a large temp table, and show that at a certain point the deallocated pages are being credited to a system spid. In practice, however, workloads have a mix of small and large temporary objects, so the deallocation value is often non-zero but still incorrect.
When we first observed this problem Robert mentioned that he thought it was probably an issue to do with deferred drops in tempdb. In the last week I’ve heard back from Microsoft, who have confirmed this. Specifically, the problem is that deferred drops of over 128 extents are not incrementing the user_objects_dealloc_page_count in sys.dm_db_session_space_usage. Hop on over to this post on Robert’s blog, where he describes this issue and the concept of deferred drops in great detail.
Just wanted to share these findings with the community. To my reading, these DMVs remain useful to give overall impressions of how much space is being used by sessions and can be used to identify the sessions which are using tempdb most heavily over time. While this bug is in place, however, it is not possible to state that subtracting deallocations from allocations will yield current usage for a specific session.
Many thanks to the folks at SQLClinic at PASS Summit, to the CSS escalation team at Microsoft, and to Robert Davis for helping identify this issue. Don’t miss Robert’s post, which has some important details about how deferred drops work.