Sql-server – Query in procedure occasionally does not finish

profilersql server

I have a question about how to debug very strange behavior (nested query getting stuck) in SQL Server 2008 R2 (Standard Edition) running a query.

Sometimes (unfortunately not always) when I run a complex procedure that does some data transformations involving:

-- Update of statistics.
EXEC sp_updatestats;

-- Run of another procedure with single complex query
-- procedure declared with recompile.
EXEC usp_query_procedure;

I run into problem that this procedure is not completing (typical time < 1s, when stuck the running time can be days, never finished yet).

If I take out the query text and run it separately it finishes in under a second, but if I run it inside the procedure it gets stuck.

When these symptoms started I didn't have either WITH RECOMPILE or sp_updatestats.
I have googled around and found that cached bad execution plan (for some input paramters) may be used and that with recompile will help (provided that statistics are updated). While these two options seem to have mitigated the problem (it occurrs less often) it sure did not go away completely. Why didnt it help in all cases? What else can cause this problem?

When the procedure is "stuck" and I attach a SQL Profiler I can see thousands of lock acuqires/releases every refresh tick. When the query runs normally there is only fraction of used locks, but thats probably obvious.

What is also very strange is that this happens only on one (Standard Edition) of four servers running the same application (mix of editions Standard / Enterprise).

How can I go about debugging this further? What events should I log in profiler to be able to see what is causing the problem? Did something similar happened to sombody else here?

Please even a hint on what to google, what to look for, what to trace can lead to a solution. I will be sure to post my findings here for future generation If I manage to solve the problem.

Best Answer

I wouldn't bother with using Profiler at this point. You're likely just exasperating the issue, since streaming that back through a UI is not exactly light on resources. If you want to trace, set up a server-side trace and inspect the results afterward.

But first, let's start simpler. When the procedure is running in its "stuck" state, look at the DMV sys.dm_exec_requests. You want to see what it is waiting on (wait_type, last_wait_type) and whether it is being blocked by another session (blocking_session_id). You can also see if certain SET settings (quoted_identifier, arithabort, etc.) are different for the session than they are for the plan (which you can get in a roundabout way from sys.dm_exec_plan_attributes) and for the stored procedure itself - some of which you can get using OBJECTPROPERTY().

For example, let's compare just a few settings for the procedure, the session and the plan. First, while the query is running forever (and assuming you know the spid; I'll assume here it is 67), check these values from sys.dm_exec_requests and sys.dm_exec_sessions:

SELECT [status], wait_type, last_wait_type, plan_handle, blocking_session_id
  FROM sys.dm_exec_requests 
  WHERE session_id = 67;

SELECT [quoted_identifier], [arithabort], [ansi_nulls], [date_format]
  FROM sys.dm_exec_sessions 
  WHERE session_id = 67;

Now, this might immediately tell you where to look for the problem : you may have a wait_type that indicates there is an I/O problem, or you might see that the session is being blocked - in which case you can investigate what that session (blocking_session_id) is doing.

If neither of those pieces of information are useful, let's move on to the plan. A common issue with plans is that they can be different depending on various SET options, and this can result in quite different performance for two different users or an application and SSMS.

The plan handle is going to be a big 0x00... varbinary value in the results above. You can copy and paste that into this query:

SELECT 
  [quoted_identifier] = MAX(CASE WHEN 
      CASE WHEN attribute = 'set_options' THEN 
        64 & CONVERT(INT, a.value) END = 64 THEN 1 ELSE 0 END),
  [arithabort] = MAX(CASE WHEN 
      CASE WHEN attribute = 'set_options' THEN 
        4096 & CONVERT(INT, a.value) END = 4096 THEN 1 ELSE 0 END),
  [ansi_nulls] = MAX(CASE WHEN 
      CASE WHEN attribute = 'set_options' THEN 
        32 & CONVERT(INT, a.value) END = 32 THEN 1 ELSE 0 END),
  [date_format] = MAX(CASE WHEN attribute = 'date_format' THEN 
        CONVERT(INT, a.value) END)
FROM sys.dm_exec_cached_plans AS p
CROSS APPLY sys.dm_exec_plan_attributes(p.plan_handle) AS a

WHERE p.plan_handle = 0x... --<-- *** put the plan handle here ***

AND a.attribute IN ('set_options', 'date_format');

And you can compare those results with the session values you pulled above, and also some of these properties are stored with the procedure itself:

SELECT OBJECTPROPERTY(OBJECT_ID('dbo.usp_query_procedure'), 'IsQuotedIdentOn');
SELECT OBJECTPROPERTY(OBJECT_ID('dbo.usp_query_procedure'), 'ExecIsQuotedIdentOn');

You can also repeat that query for any other plans you find for the same procedure, to see if there are in fact cases where the procedure is executed with different settings:

SELECT 
  p.plan_handle,
  [quoted_identifier] = MAX(CASE WHEN 
      CASE WHEN attribute = 'set_options' THEN 
        64 & CONVERT(INT, a.value) END = 64 THEN 1 ELSE 0 END),
  [arithabort] = MAX(CASE WHEN 
      CASE WHEN attribute = 'set_options' THEN 
        4096 & CONVERT(INT, a.value) END = 4096 THEN 1 ELSE 0 END),
  [ansi_nulls] = MAX(CASE WHEN 
      CASE WHEN attribute = 'set_options' THEN 
        32 & CONVERT(INT, a.value) END = 32 THEN 1 ELSE 0 END),
  [date_format] = MAX(CASE WHEN attribute = 'date_format' THEN 
        CONVERT(INT, a.value) END)
FROM sys.dm_exec_cached_plans AS p
CROSS APPLY sys.dm_exec_plan_attributes(p.plan_handle) AS a
INNER JOIN sys.dm_exec_procedure_stats AS s
ON p.plan_handle = s.plan_handle
AND s.[object_id] = OBJECT_ID('dbo.usp_query_procedure')
AND a.attribute IN ('set_options', 'date_format')
GROUP BY p.plan_handle; 

If you see more than one result, then check every column. Any columns where one plan shows a 1 and the other plan shows a 0 would be suspicious to me.

In the meantime you can read a few great posts about this and similar issues:

http://www.sommarskog.se/query-plan-mysteries.html

http://www.simple-talk.com/sql/t-sql-programming/dmvs-for-query-plan-metadata/

http://www.benjaminnevarez.com/2011/09/parameter-sniffing-and-plan-reuse-affecting-set-options/

And yes, sp_WhoIsActive will probably give you some insight into what is going on as well.