I had the pleasure of presenting at the SSWUG Spring 2016 Virtual Conference today on the topic of SQL 2016 Temporal Tables. Thank you to all who attended! Here are the slides and demo scripts for that presentation.
This month’s TSQL Tuesday prompt is on the topic of assumptions. I’d like to take a brief moment and talk about one of the more insidious and pervasive sources of bugs in TSQL application code I’ve seen over the years.
Let’s say I present you with a date: 08/12/2014
Quick! What month am I talking about?
If you answered August 12, 2014, you might be correct. If you answered December 8, 2014, you might also be correct. If you sneakily answered “It depends!” then, well, you’ve picked up on where I am going with this.
I very frequently see coders assuming that their local timezone, region, collation, and language are somehow universal, leading to coding practices which limit the geographical reach of their software. Here are some of the most common things which I see overlooked:
- Regional date format – Keep in mind that mm/dd/yyyy and dd/mm/yyyy cannot be easily distinguished until the 13th of any given month. Usage varies around the world and even within countries. Use yyyymmdd instead!
- Regional number formatting – Though this a larger problem in app code than in TSQL code, you still see the issue sometimes. A decimal may be represented as 3.14 or 3,14 depending on local settings. There are more variations than you might expect.
- Time zones – You would hope it could go without saying that not all users will necessarily be located in the same time zone, but applications have issues with this all the time. Time zones are also not static and are affected by Daylight Savings Time windows which can shift or be cancelled outright. Storing data in UTC time and applying offsets in the presentation layer can usually solve this problem. And don’t forget – not all time zones are on even hour boundaries!
- Collation settings – Honestly, the thing I see the most often is the assumption of a case insensitive collation. This problem bit me so many times in my early days working on SQL Diagnostic Manager that I switched my entire development environment to case sensitive many years ago. I also put Korean, Russian, Japanese, and Binary collations into my test environments. It’s better to write your code to handle these issues early on than to play whack-a-mole with collation conflict errors later on.
You may feel safe with these assumptions in quick little scripts now and then, but remember, the moment you put a script up on your blog, it has gone out to an international audience who may have different settings from you. You may think to yourself that all of your company’s servers are located two floors away and are all of the same predictable region, collation, and language settings – but then you attend a quarterly kick of meeting and find out you’ve merged with a company on another continent and, by the way, you’re now responsible for their servers too!
Save yourself some heartache and add a nice sparkle of professionalism to your code by writing it to be region and language agnostic. You’ll be a better coder for it.
This afternoon I was asked a question that comes up once or twice a year by email. I answered as usual, and only later realized that the topic was a reasonably good match for this month’s TSQL Tuesday topic. I thought I might share the question, and the answer, with a larger audience this time.
Every once in a while I’ll receive a question about a monitoring batch whose spid has gotten hung, and attempts to kill the session simply leave the spid in a rollback state for hours or days on end. Almost invariably these sessions are presenting with an external wait type – usually either MSQL_XP or one of the PREEMPTIVE_OS_* wait types. The usual suspects are calls to xp_fixeddrives and other extended stored procedures backed by xpstar.dll and the sp_oa* procedures that support OLE Automation.
Here is an example of a query which will cause the problem. (Note that enablement of xp_cmdshell is a security hazard and thus this code should only be attempted on a test machine)
exec xp_cmdshell 'powershell -command "sleep 1"'
When executed, this command will execute for exactly 100 seconds. Executing the following code will yield some information about what the session is doing.
select session_id, wait_type, command from sys.dm_exec_requests where session_id = <session_id>
If the session running xp_cmdshell is killed, the session will move to a killed/rollback state, but retain the same wait type.
It will remain in this state until the 100 seconds elapse.
Now, this is all well and good for a session that will die after 100 seconds, but what about a session which is truly and intractably hung? Or what if it was a 10,000 second wait? Is there anything to be done?
Well, it depends. In this specific case, yes. The xp_cmdshell extended stored procedure is running its command out of process to SQL Server. In fact, you can see the life of the session using Sysinternals ProcMon. Here I’ve reduced the wait to 1 second just to capture it in a neat window.
SQLServer creates the cmd.exe process and passes it the command, which in turn calls powershell. Each of these processes is a full fledged process visible in Process Explorer.
So in this particular case, if the sleep command was for 10,000 seconds, after the session kill failed, it would be possible to end the session by killing powershell.exe using Process Explorer – and that does indeed work.
However, there are other external waits which cannot be killed in this way, and these are generally a bit more gnarly to work with. For example, on my machine, it takes about 7 second for the following command to run:
And if I watch it in procmon, I can see that it is SQL Server’s own process which is walking the directory tree.
Now, this command is having no trouble, and I can kill it quite easily. But as you can see, there’s no process other than SQL Server itself involved.
What I usually see is that either xp_fixeddrives, xp_readerrorlogs, xp_enumerrorlogs, or one of the OLE Automation stored procedures gets snagged up while running in process to SQL Server. There’s nothing to kill other than SQL Server itself in this case, and the process generally has to remain a zombie until the underlying problem is solved and/or the SQL Server service is cycled.
In the case of OLE Automation stored procedures specifically, it is possible to run these commands out-of-process to SQL Server just as in the example with xp_cmdshell above. Sometimes this can be done simply by setting the context flag of sp_OACreate. In other cases, it may be necessary to fiddle with registry settings to change the threading model of the assembly being loaded from “Apartment” to “Both” – something which, it goes without saying, is best tried in a test environment first.
So, that’s my experience with stuck spids and external wait types. If you’re lucky, when you get stuck with one of these, it’s an out of process wait. If you’re not… well, you may be waiting for a while longer.
Over the summer it’s been my privilege to speak on the subject of Care and Feeding of Your System Databases at the CACTUSS SQL Server user groups in Austin as well as at SQLSaturday #222 in Sacramento. With the assistance of Idera and MSSQLTips.com, that presentation is also available online for viewing. Here is the abstract:
Master, model, msdb, tempdb. You see them on all of your servers, but what do you really know about them?
If you ever wanted to learn about the SQL Server system databases, their purpose, and the maintenance and configuration required to keep your server running its best, then this session is for you!
Learn best practices and techniques, including how to restore master and how to configure tempdb to reduce contention. You’ll leave with a better understanding of these shared resources and how they affect your database applications.
For the past few months I have been chasing down an issue which has been plaguing my attempts at tempdb monitoring, and in just the last week I’ve received news that it is indeed a confirmed bug in SQL Server. I’m sharing what I’ve found here for the benefit of the SQL community.
One of the features I worked on for SQL diagnostic manager was our tempdb monitoring feature, whereby users can track and alert on a number of tempdb-specific metrics. Among the items collected are the tempdb page allocation and deallocation values from sys.dm_db_session_space_usage and sys.dm_db_task_space_usage – specifically, the following columns:
The details of how to use these values in troubleshooting are covered in this Troubleshooting Insufficient Disk Space in tempdb article by Microsoft. In brief, these values are intended to track the number of pages allocated or deallocated by a task or session, which can allow the practitioner to identify sessions which are using excessive amounts of space in tempdb or which are churning through a large number of pages over their lifetime.
We were periodically seeing cases where the apparent space used by a session – as calculated by subtracting the deallocation count from the allocation count for a session – was well in excess of the total size of tempdb. I worked closely with Robert Davis (b|t) to identify whether this was a bug in the product or a misunderstanding of the contents of the DMV, but he confirmed our logic and believed it was probably a bug. In deeper testing I was able to reproduce the issue very consistently using simple batches. I’d like to walk you through the process.
In this example, a temp table is being created and then rows are being added. I’ve noted the line where a “go 10” is used – this causes the script to fill the temp table with 10 rows before dropping it. The script waits 10 seconds, then checks the allocation and deallocation counts from sys.dm_db_session_space_usage and sys.dm_db_task_space_usage.
set nocount on create table #BigTempTable(a nchar(4000) default 'a') go insert into #BigTempTable default values -- Note this value go 10 drop table #BigTempTable go waitfor delay '00:00:10' select 'Session Space', ssu.session_id, sessionUserPagesAlloc = sum(ssu.user_objects_alloc_page_count), sessionUserPagesDealloc = sum(ssu.user_objects_dealloc_page_count), taskUserPagesAlloc = sum(tsu.user_objects_alloc_page_count), taskUserPagesDealloc = sum(tsu.user_objects_dealloc_page_count), sessionInternalPagesAlloc = sum(ssu.internal_objects_alloc_page_count), sessionInternalPagesDealloc = sum(ssu.internal_objects_dealloc_page_count), taskInternalPagesAlloc = sum(tsu.internal_objects_alloc_page_count), taskInternalPagesDealloc = sum(tsu.internal_objects_dealloc_page_count) from tempdb.sys.dm_db_session_space_usage ssu left join tempdb.sys.dm_db_task_space_usage tsu on ssu.session_id = tsu.session_id where ssu.session_id = @@spid group by ssu.session_id
When run as above, with go 10, the DMV data returns exactly what is expected (I have switched rows and columns below for easy readability).
Each of the 10 loops adds 1 row to the temp table, and at that row size each row is 1 page. The first 8 pages come from mixed extents, and pages 9 and 10 come from a second extent, for a total of 16 pages allocated to the table. Upon dropping the table, all of that space is deallocated, so we see both an allocation and a deallocation of 16 pages.
If I reset the connection so I have a fresh spid and change the “go 10” to “go 5000,” however, the returned data begins looking a bit suspect.
At 5000 rows, this data divides up nicely into 625 extents, so exactly 5000 pages are allocated. However, no pages are marked as deallocated even though the object was dropped.
If I remove the “where ssu.session_id = @@spid” clause and look at the space utilization of all sessions, I find that the deallocation value is apparently being credited to a system spid – in this case, spid 26:
So this is the cause of the incorrect values for session space used. These two contrived examples show a small and a large temp table, and show that at a certain point the deallocated pages are being credited to a system spid. In practice, however, workloads have a mix of small and large temporary objects, so the deallocation value is often non-zero but still incorrect.
When we first observed this problem Robert mentioned that he thought it was probably an issue to do with deferred drops in tempdb. In the last week I’ve heard back from Microsoft, who have confirmed this. Specifically, the problem is that deferred drops of over 128 extents are not incrementing the user_objects_dealloc_page_count in sys.dm_db_session_space_usage. Hop on over to this post on Robert’s blog, where he describes this issue and the concept of deferred drops in great detail.
Just wanted to share these findings with the community. To my reading, these DMVs remain useful to give overall impressions of how much space is being used by sessions and can be used to identify the sessions which are using tempdb most heavily over time. While this bug is in place, however, it is not possible to state that subtracting deallocations from allocations will yield current usage for a specific session.
Many thanks to the folks at SQLClinic at PASS Summit, to the CSS escalation team at Microsoft, and to Robert Davis for helping identify this issue. Don’t miss Robert’s post, which has some important details about how deferred drops work.
After taking a break from speaking for a while, I’m happy to announce that I’ll be presenting at the CACTUSS Central and CACTUSS North meetings in Austin, TX on May 20 and May 21, respectively. I’ll be bringing a new presentation, “Care and Feeding of Your System Databases.” We’ll go over the SQL Server system databases, what they’re used for, configuration and maintenance best practices and antipractices, and a few troubleshooting scenarios.
If you’ve got questions you’d like answered feel free to drop me a line ahead of time and I’ll try to cover them in the presentation – otherwise, hope to see you there!
As a student I was always a great fan of the International Obfuscated C Code Contest. If you’re not familiar with obfuscation contests, they’re basically a two part brain teaser for coders. The first part, and the one I find really satisfying, is to write code to do some task in a completely non-obvious way such that it is very tricky to work out what the end result will be. The second part is to take someone else’s obfuscated code and try to work out what they’re doing and how. There’s probably a joke in there about code maintainability, but at their best, code obfuscation puzzles have taught me a lot about the languages I work in.
For the past two years I have taken a stab at an obfuscated T-SQL code holiday greeting card. The code displays poorly on mobile devices and some browsers, so I’ve posted images here (the source, posted again below, is here). Here is this year’s card:
And here’s last year’s:
I won’t spoil the fun by posting their output, but if you’d like to give them a try, you can download the source code to both here.
Hope you enjoy them, and have a wonderful holiday and a happy and prosperous new year!
I thought it might be useful to start a semi-regular series of entries on troubleshooting for the SQL Server developer. Google has come to my rescue lo, these many times: this is my way of returning the favor. For this first entry I thought I’d tackle one of the more mysterious-looking runtime errors I see from time to time.
It usually goes about like this – code that has been running fine for ages, or which works fine for everyone else in the universe, suddenly starts bailing out with a runtime error. The error, when located, looks like this:
Msg 0, Level 11, State 0, Line 0
A severe error occurred on the current command. The results, if any, should be discarded.
It’s really a frightful-looking message, and worse, double-clicking on the message in SSMS does not jump your cursor to the source of the error. Well great.
This is where your mileage may vary, but for me, I generally see this error crop up in code which follows this pattern:
- The batch is utilizing an extended stored procedure
- The output of the extended stored procedure is being insert into a table
By way of example, here’s a snippet of code that will reliably cause the error:
create table #ErrorLog (LogDate datetime, ProcessInfo nvarchar(50), LogText nvarchar(500)) insert into #ErrorLog exec xp_readerrorlog 'authentication mode' drop table #ErrorLog
In this case the output of xp_readerrorlog is being insert into a temporary table. However, if you’re familiar with xp_readerrorlog you’ll recognize that there’s something not quite right going on here. If you take the outer insert out of the equation and run the procedure by itsef, the error message becomes clearer:
exec xp_readerrorlog 'authentication mode'
The output in this case is:
Msg 22004, Level 12, State 1, Line 0
Error executing extended stored procedure: Invalid Parameter Type
Yes, indeed, ‘authentication mode’ is not a valid parameter value here: the procedure is expecting an integer. That’s much clearer than “a severe error.” Fixing that bug solves the issue.
create table #ErrorLog (LogDate datetime, ProcessInfo nvarchar(50), LogText nvarchar(500)) insert into #ErrorLog exec xp_readerrorlog 1,1,'authentication mode' drop table #ErrorLog
This is of course a contrived example and this code would never have worked, but often in production these errors are more subtle. I have seen the inner exception be an overflow, a permissions problem, or even outright corruption, and often it’s not as clear-cut and perfectly reproducible as this example. For instance, the stored procedure in the outer batch may be a regular system or user procedure which in turn calls an extended stored procedure, but only with certain codepaths. You may really need to peel the onion on this one.
Certainly I don’t believe this is the only error case that can lead to this particular message, but for what it’s worth, in my experience looking for the XSP has generally put me on the right path toward quashing the bug. If you have any other experience with this error, please let me know either through my contact page or through the comments.
A month from today on April 14, 2012 I’ll be speaking at SQLSaturday #111 in Atlanta, GA (or, more precisely, in Alpharetta, GA). This will be my first trip to Georgia and I’m delighted to be able to present alongside some of the biggest names in the SQL Server community. It’s a great honor to be chosen and I am so looking forward to the event.
For this SQLSaturday I’ll be giving my presentation Introduction to Common Table Expressions. In this example-heavy session I offer an introduction to T-SQL common table expression syntax and cover the basics of recursive expressions, with an eye toward when and how this style of coding can improve maintainability. I’ll also touch upon when it makes sense to use recursive CTEs (spoilers: not very often).
I look forward to meeting many of you at SQLSaturday #111. You can see this and all of my upcoming and past SQL Server events here.