SQL Server – Intermittent RESOURCE_SEMAPHORE_QUERY_COMPILE Wait Stats

sql serverwait-types

I'm trying to troubleshoot some intermittent CPU spikes that we're witnessing on one of our production SQL Servers. We're running SQL Server 2008 R2 Standard Edition with 28 GB RAM and 4 CPU cores. When this happens, we are noticing a large number of RESOURCE_SEMAPHORE_QUERY_COMPILER waits, which lasts for about a minute or two and then stops, which then the CPU usage returns to normal.

After researching this, I understand that this is normally caused by compiling lots of non-reusable execution plans, which we are currently working on changes to our application to address.

Can this behavior also be triggered by plan cache evictions due to memory pressure? If so, how would I check for this? I'm trying to see if there are any short-term remedies we can do, like upgrading the server RAM, until we deploy our application fixes. The only other short-term option I can think of is to move some of our busiest databases to different server.

Best Answer

I believe you will see this symptom if you have a LOT of large query plans that are fighting for memory in order to compile (this has very little to do with running the query itself). To hit this, I suspect you are using an ORM or some kind of application that generates many unique but relatively complex queries. SQL Server could be under memory pressure because of things like large query operations, but on further thought it is more likely just that your system is configured with far less memory than it needs (either there is never enough memory to satisfy all of the queries you're trying to compile, or there are other processes on the box that are stealing memory from SQL Server).

You can take a look at what SQL Server is configured with using:

EXEC sp_configure 'max server memory';    -- max configured in MB

SELECT counter_name, cntr_value
  FROM sys.dm_os_performance_counters
  WHERE counter_name IN
  (
    'Total Server Memory (KB)',    -- max currently granted
    'Target Server Memory (KB)'    -- how much SQL Server wished it had
  );

You can identify the cached plans that required the most compile memory with the following Jonathan Kehayias query, adapted slightly:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

;WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
SELECT TOP (10) CompileTime_ms, CompileCPU_ms, CompileMemory_KB,
  qs.execution_count,
  qs.total_elapsed_time/1000.0 AS duration_ms,
  qs.total_worker_time/1000.0 as cputime_ms,
  (qs.total_elapsed_time/qs.execution_count)/1000.0 AS avg_duration_ms,
  (qs.total_worker_time/qs.execution_count)/1000.0 AS avg_cputime_ms,
  qs.max_elapsed_time/1000.0 AS max_duration_ms,
  qs.max_worker_time/1000.0 AS max_cputime_ms,
  SUBSTRING(st.text, (qs.statement_start_offset / 2) + 1,
    (CASE qs.statement_end_offset
      WHEN -1 THEN DATALENGTH(st.text) ELSE qs.statement_end_offset
     END - qs.statement_start_offset) / 2 + 1) AS StmtText,
  query_hash, query_plan_hash
FROM
(
  SELECT 
    c.value('xs:hexBinary(substring((@QueryHash)[1],3))', 'varbinary(max)') AS QueryHash,
    c.value('xs:hexBinary(substring((@QueryPlanHash)[1],3))', 'varbinary(max)') AS QueryPlanHash,
    c.value('(QueryPlan/@CompileTime)[1]', 'int') AS CompileTime_ms,
    c.value('(QueryPlan/@CompileCPU)[1]', 'int') AS CompileCPU_ms,
    c.value('(QueryPlan/@CompileMemory)[1]', 'int') AS CompileMemory_KB,
    qp.query_plan
FROM sys.dm_exec_cached_plans AS cp
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS qp
CROSS APPLY qp.query_plan.nodes('ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS n(c)
) AS tab
JOIN sys.dm_exec_query_stats AS qs ON tab.QueryHash = qs.query_hash
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
ORDER BY CompileMemory_KB DESC
OPTION (RECOMPILE, MAXDOP 1);

You can see how the plan cache is being used with the following:

SELECT objtype, cacheobjtype,
    AVG(size_in_bytes*1.0)/1024.0/1024.0,
    MAX(size_in_bytes)/1024.0/1024.0,
    SUM(size_in_bytes)/1024.0/1024.0,
    COUNT(*)
FROM sys.dm_exec_cached_plans
GROUP BY GROUPING SETS ((),(objtype, cacheobjtype))
ORDER BY objtype, cacheobjtype;

When you are experiencing high semaphore waits, check to see if these query results vary significantly from during "normal" activity:

SELECT resource_semaphore_id, -- 0 = regular, 1 = "small query"
  pool_id,
  available_memory_kb,
  total_memory_kb,
  target_memory_kb
FROM sys.dm_exec_query_resource_semaphores;

SELECT StmtText = SUBSTRING(st.[text], (qs.statement_start_offset / 2) + 1,
        (CASE qs.statement_end_offset
          WHEN -1 THEN DATALENGTH(st.text) ELSE qs.statement_end_offset
         END - qs.statement_start_offset) / 2 + 1),
  r.start_time, r.[status], DB_NAME(r.database_id), r.wait_type, 
  r.last_wait_type, r.total_elapsed_time, r.granted_query_memory,
  m.requested_memory_kb, m.granted_memory_kb, m.required_memory_kb,
  m.used_memory_kb
FROM sys.dm_exec_requests AS r
INNER JOIN sys.dm_exec_query_stats AS qs
ON r.plan_handle = qs.plan_handle
INNER JOIN sys.dm_exec_query_memory_grants AS m
ON r.request_id = m.request_id
AND r.plan_handle = m.plan_handle
CROSS APPLY sys.dm_exec_sql_text(r.plan_handle) AS st;

And you may also want to look and see how memory is distributed:

DBCC MEMORYSTATUS;

And there is some good information here about why you might be seeing a high number of compiles/recompiles (which will contribute to that wait):

http://technet.microsoft.com/en-us/library/ee343986(v=sql.100).aspx

http://technet.microsoft.com/en-us/library/cc293620.aspx

You can check for high compile/recompile counts using the following counters:

SELECT counter_name, cntr_value
  FROM sys.dm_os_performance_counters
  WHERE counter_name IN 
  (
    'SQL Compilations/sec',
    'SQL Re-Compilations/sec'
  );

And you can check for internal memory pressure leading to evictions - non-zero counters here would indicate that something not good is going on with the plan cache:

SELECT * FROM sys.dm_os_memory_cache_clock_hands 
  WHERE [type] IN (N'CACHESTORE_SQLCP', N'CACHESTORE_OBJCP');

NOTE Most of these metrics don't have a magic "oh my gosh I need to panic or do something!" threshold. What you need to do is to take measurements during normal system activity, and determine where these thresholds are for your hardware, configuration and workload. When you ~~panic~~ do something is when two conditions are true:

the metrics vary significantly from normal values; and,
there is actually a performance problem occurring (like your CPU spikes) - but only if they are actually interfering with anything. Other than seeing the CPUs spike, are you seeing any other symptom? In other words, is the spike the symptom, or is the spike causing other symptoms? Would users of the system ever notice? A lot of people always go after their highest wait consumer, simply because it's the highest. Something is always going to be the highest wait consumer - you have to know that it's varying enough from normal activity that it indicates a problem or some significant change.

Optimize for ad hoc workloads is a great setting for 99% of the workloads out there, but it will not be very helpful in reducing compilation costs - it is aimed at reducing plan cache bloat by preventing a single-use plan from storing the whole plan until it's been executed twice. Even when you only store the stub in the plan cache, you still have to compile the full plan for the execution of the query. Perhaps what @Kahn meant to recommend was setting the database level parameterization to forced, which will potentially provide better plan re-use (but it really depends on how unique all of these high-cost queries are).

Also some good information in this white paper about plan caching and compilation.

Related Solutions

How to Collect Wait Stats in SQL Server 2012

SQL Server works in Non preemtive mode which means that if SQLOS asks it to yeild because it got request from windows OS it will ask SQL server to yeild and SQL Server will listen to it and will yeild or will do as SQLOS has asked it to do. This is because SQL server runs as application and is allocated resources by SQLOS which is monitored by windows O. Preemtive wait types occur when SQL server is executing a task and is interrupted by OS and asked to give up the thread it is using so that can be allocated for other tasks and SQL will do it. It will yeild and will wait till the thread is available this waiting will come under PREEMTIVE-XXX waits.

What is version fo SQL server here is it patched to latest Service pack there was a bug in SQL server 2008 which points incorrect value of preemtive wait types.

Can you run sys.dm_exec_requests DMV and see if the process getting preemtive wait types are suspended or running.

Can you please post output of below query(By Jonathan Kehayias) to capture wait stats

SELECT TOP 10
wait_type ,
max_wait_time_ms wait_time_ms ,
signal_wait_time_ms ,
wait_time_ms - signal_wait_time_ms AS resource_wait_time_ms ,
100.0 * wait_time_ms / SUM(wait_time_ms) OVER ( )
AS percent_total_waits ,
100.0 * signal_wait_time_ms / SUM(signal_wait_time_ms) OVER ( )
AS percent_total_signal_waits ,
100.0 * ( wait_time_ms - signal_wait_time_ms )
/ SUM(wait_time_ms) OVER ( ) AS percent_total_resource_waits
FROM sys.dm_os_wait_stats
WHERE wait_time_ms > 0 -- remove zero wait_time
AND wait_type NOT IN -- filter out additional irrelevant waits
( 'SLEEP_TASK', 'BROKER_TASK_STOP', 'BROKER_TO_FLUSH',
'SQLTRACE_BUFFER_FLUSH','CLR_AUTO_EVENT', 'CLR_MANUAL_EVENT',
'LAZYWRITER_SLEEP', 'SLEEP_SYSTEMTASK', 'SLEEP_BPOOL_FLUSH',
'BROKER_EVENTHANDLER', 'XE_DISPATCHER_WAIT', 'FT_IFTSHC_MUTEX',
'CHECKPOINT_QUEUE', 'FT_IFTS_SCHEDULER_IDLE_WAIT',
'BROKER_TRANSMITTER', 'FT_IFTSHC_MUTEX', 'KSOURCE_WAKEUP',
'LOGMGR_QUEUE', 'ONDEMAND_TASK_QUEUE',
'REQUEST_FOR_DEADLOCK_SEARCH', 'XE_TIMER_EVENT', 'BAD_PAGE_PROCESS',
'DBMIRROR_EVENTS_QUEUE', 'BROKER_RECEIVE_WAITFOR',
'PREEMPTIVE_OS_GETPROCADDRESS', 'PREEMPTIVE_OS_AUTHENTICATIONOPS',
'WAITFOR', 'DISPATCHER_QUEUE_SEMAPHORE', 'XE_DISPATCHER_JOIN',
'RESOURCE_QUEUE' )
ORDER BY wait_time_ms DES

What other processes are running on system is OS under load how many CPU cores does system have ?

EDIT: After user pasted output

The output does not gives and concrete picture use my query. Also are you running extended events traces because i can see XE wait types. I consider this wait type as harmful and it seems you are not facing a issue. Monitoring tools sometimes overreact so with result you posted I would just think its normal behavior.

EDIT2: I dont find any issue with the wait stats output you posted. I also assume your server was not restarted recently otherwise waits stats will not be useful.

Sql-server – Wait Stats, Server Up Time and CXPACKET

It's strange how many "problems" CXPACKET is causing :)

You can find my answer to similar question at this link: https://dba.stackexchange.com/a/141399/28611

But I will re-post my answer here as well

The CXPACKET was always the confusing wait type for younger DBAs and some predictably wrong reactions are generally expected. There are multiple aspects of the CXPACKET wait type and I've tried in Troubleshooting the CXPACKET wait type in SQL Server article to bring on the table most of the reasons for high CXPACKET but also to explain the background of the CXPACKET, as the proper understanding of parallelism in SQL Server is the key for understanding

So for those who don't want to get in details, I will post the summary of the article here (but I would definitely suggest reading of the article for complete info about CXPACKET wait type):

Do not set MAXDOP to 1, as this is never the solution

Investigate the query and CXPACKET history to understand and determine whether it is something that occurred just once or twice,
as it could be just the exception in the system that is normally
working correctly

Check the indexes and statistics on tables used by the query and make sure they are up to date

Check the Cost Threshold for Parallelism (CTFP) and make sure that the value used is appropriate for your system

Check whether the CXPACKET is accompanied with a LATCH_XX (possibly with PAGEIOLATCH_XX or SOS_SCHEDULER_YIELD as well). If this is the
case than the MAXDOP value should be lowered to fit your hardware

Check whether the CXPACKET is accompanied with a LCK_M_XX (usually accompanied with IO_COMPLETION and ASYNC_IO_COMPLETION). If this is
the case, then parallelism is not the bottleneck. Troubleshoot those
wait stats to find the root cause of the problem and solution

Best Answer

Related Solutions

How to Collect Wait Stats in SQL Server 2012

Sql-server – Wait Stats, Server Up Time and CXPACKET

Related Question