Sql-server – Collecting Wait Stats

performanceperformance-tuningsql serversql-server-2012wait-types

We currently use a monitoring tool which shows us our top wait stats either by number of waiting tasks or total wait time. Below are the wait stats by number of tasks waiting and also wait time per task.

We have users complaining of slow down in the system, yet the metrics for server seem fine in terms of Disk IO, memory and CPU. Does anyone know if the PREEMPTIVE waits are an issue?

Number of waiting tasks
SOS_SCHEDULER_YIELD
PAGELATCH_EX
PAGELATCH_SH
PREEMPTIVE_XE_CALLBACKEXECUTE
PREEMPTIVE_XE_GETTARGETSTATE
PREEMPTIVE_XE_SESSIONCOMMIT

Average wait per task
PAGEIOLATCH_SH
PREEMPTIVE_XE_GETTARGETSTATE

Update:
I ran a query from Paul Randal similar to what you posted and got the following:

WaitType    Wait_S  Resource_S  Signal_S    WaitCount   Percentage  AvgWait_S   AvgRes_S    AvgSig_S
PREEMPTIVE_XE_GETTARGETSTATE    9704.81 9704.81 0.00    604647  44.60   0.0161  0.0161  0.0000

I know this doesn't line up very well, but basically this wait type has accounted for %44.60 of all wait types. Also as there has been no Signal Waits on this type then this indicates no CPU pressure but instead a wait on some other resource. Not sure how I deduce what that resource is however.

Also this is SQL 2012 SP1

Update2
AS requested here is results from your query. In regards extended events the only sessions running is the default system_health one and 2 SharePoint ones I have just noticed, they must have been put there by default. I may switch these off I wonder if these are causing an issue.

Its interesting that my PREEMPTIVE_XE_GETTARGETSTATE doesn't seem to be in this list.

wait_type   wait_time_ms    signal_wait_time_ms resource_wait_time_ms   percent_total_waits percent_total_signal_waits  percent_total_resource_waits
SP_SERVER_DIAGNOSTICS_SLEEP 300014  355508314   0   24.621089361251698  99.883069863550302  0.000000000000000
MSQL_XP 96782   0   4268999 0.295653861591811   0.000000000000000   0.295653861591811
ASYNC_IO_COMPLETION 56193   64  345552  0.023935987107964   0.000017981341700   0.023931554722962
BACKUPTHREAD    41100   6850    265025  0.018828979257262   0.001924565478840   0.018354575549998
LCK_M_U 40500   71  41030   0.002846491499596   0.000019948050948   0.002841574322484
PWAIT_ALL_COMPONENTS_INITIALIZED    31422   0   94205   0.006524262955146   0.000000000000000   0.006524262955146
XE_LIVE_TARGET_TVF  28050   0   33458   0.002317167771915   0.000000000000000   0.002317167771915
LCK_M_X 4027    50  29195   0.002025392177944   0.000014047923203   0.002021929377161
SQLTRACE_INCREMENTAL_FLUSH_SLEEP    4018    613 355390660   24.612983567922965  0.000172227538471   24.612941113985366
CXPACKET    3756    1   14755   0.001021941767062   0.000000280958464   0.001021872511047

Best Answer

SQL Server works in Non preemtive mode which means that if SQLOS asks it to yeild because it got request from windows OS it will ask SQL server to yeild and SQL Server will listen to it and will yeild or will do as SQLOS has asked it to do. This is because SQL server runs as application and is allocated resources by SQLOS which is monitored by windows O. Preemtive wait types occur when SQL server is executing a task and is interrupted by OS and asked to give up the thread it is using so that can be allocated for other tasks and SQL will do it. It will yeild and will wait till the thread is available this waiting will come under PREEMTIVE-XXX waits.

What is version fo SQL server here is it patched to latest Service pack there was a bug in SQL server 2008 which points incorrect value of preemtive wait types.

Can you run sys.dm_exec_requests DMV and see if the process getting preemtive wait types are suspended or running.

Can you please post output of below query(By Jonathan Kehayias) to capture wait stats

SELECT TOP 10
wait_type ,
max_wait_time_ms wait_time_ms ,
signal_wait_time_ms ,
wait_time_ms - signal_wait_time_ms AS resource_wait_time_ms ,
100.0 * wait_time_ms / SUM(wait_time_ms) OVER ( )
AS percent_total_waits ,
100.0 * signal_wait_time_ms / SUM(signal_wait_time_ms) OVER ( )
AS percent_total_signal_waits ,
100.0 * ( wait_time_ms - signal_wait_time_ms )
/ SUM(wait_time_ms) OVER ( ) AS percent_total_resource_waits
FROM sys.dm_os_wait_stats
WHERE wait_time_ms > 0 -- remove zero wait_time
AND wait_type NOT IN -- filter out additional irrelevant waits
( 'SLEEP_TASK', 'BROKER_TASK_STOP', 'BROKER_TO_FLUSH',
'SQLTRACE_BUFFER_FLUSH','CLR_AUTO_EVENT', 'CLR_MANUAL_EVENT',
'LAZYWRITER_SLEEP', 'SLEEP_SYSTEMTASK', 'SLEEP_BPOOL_FLUSH',
'BROKER_EVENTHANDLER', 'XE_DISPATCHER_WAIT', 'FT_IFTSHC_MUTEX',
'CHECKPOINT_QUEUE', 'FT_IFTS_SCHEDULER_IDLE_WAIT',
'BROKER_TRANSMITTER', 'FT_IFTSHC_MUTEX', 'KSOURCE_WAKEUP',
'LOGMGR_QUEUE', 'ONDEMAND_TASK_QUEUE',
'REQUEST_FOR_DEADLOCK_SEARCH', 'XE_TIMER_EVENT', 'BAD_PAGE_PROCESS',
'DBMIRROR_EVENTS_QUEUE', 'BROKER_RECEIVE_WAITFOR',
'PREEMPTIVE_OS_GETPROCADDRESS', 'PREEMPTIVE_OS_AUTHENTICATIONOPS',
'WAITFOR', 'DISPATCHER_QUEUE_SEMAPHORE', 'XE_DISPATCHER_JOIN',
'RESOURCE_QUEUE' )
ORDER BY wait_time_ms DES

What other processes are running on system is OS under load how many CPU cores does system have ?

EDIT: After user pasted output

The output does not gives and concrete picture use my query. Also are you running extended events traces because i can see XE wait types. I consider this wait type as harmful and it seems you are not facing a issue. Monitoring tools sometimes overreact so with result you posted I would just think its normal behavior.

EDIT2: I dont find any issue with the wait stats output you posted. I also assume your server was not restarted recently otherwise waits stats will not be useful.

Related Solutions

Sql-server – Attempting to tune a Sharepoint site using SQL Server Waits and Queues

As this is SharePoint your hands are tied. You can't create indexes or statistics without loosing support from Microsoft.

CXPACKET just means that one thread of a parallized query is waiting for another thread to finish waiting for something else. SOS_SCHEDULER_YIELD can mean a few things, typically it means that you need to fix your indexes are you have queries that aren't tuned correctly so the SQL Server is pausing those queries (yielding them) so that it can work on other stuff (as this is SharePoint you can't really do anything about this). OLEDB means that the SQL Server is waiting on the OLEDB driver so probably there are linked server queries or something along those lines that are calling out to another database server. There's nothing you can do about that either.

From what it sounds like there's about nothing you can do without throwing more hardware at the problem. Use Profiler, Server Side Tracing or Extended Events and find the queries which are taking to long to run and check their execution plans and see how bad that are. Odds are those search queries are doing large scan operations.

How big are the content databases? Microsoft recommends keeping the content databases under 100 Gigs so that scanning is kept to a minimum.

Sql-server – Intermittent RESOURCE_SEMAPHORE_QUERY_COMPILE Wait Stats

I believe you will see this symptom if you have a LOT of large query plans that are fighting for memory in order to compile (this has very little to do with running the query itself). To hit this, I suspect you are using an ORM or some kind of application that generates many unique but relatively complex queries. SQL Server could be under memory pressure because of things like large query operations, but on further thought it is more likely just that your system is configured with far less memory than it needs (either there is never enough memory to satisfy all of the queries you're trying to compile, or there are other processes on the box that are stealing memory from SQL Server).

You can take a look at what SQL Server is configured with using:

EXEC sp_configure 'max server memory';    -- max configured in MB

SELECT counter_name, cntr_value
  FROM sys.dm_os_performance_counters
  WHERE counter_name IN
  (
    'Total Server Memory (KB)',    -- max currently granted
    'Target Server Memory (KB)'    -- how much SQL Server wished it had
  );

You can identify the cached plans that required the most compile memory with the following Jonathan Kehayias query, adapted slightly:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

;WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
SELECT TOP (10) CompileTime_ms, CompileCPU_ms, CompileMemory_KB,
  qs.execution_count,
  qs.total_elapsed_time/1000.0 AS duration_ms,
  qs.total_worker_time/1000.0 as cputime_ms,
  (qs.total_elapsed_time/qs.execution_count)/1000.0 AS avg_duration_ms,
  (qs.total_worker_time/qs.execution_count)/1000.0 AS avg_cputime_ms,
  qs.max_elapsed_time/1000.0 AS max_duration_ms,
  qs.max_worker_time/1000.0 AS max_cputime_ms,
  SUBSTRING(st.text, (qs.statement_start_offset / 2) + 1,
    (CASE qs.statement_end_offset
      WHEN -1 THEN DATALENGTH(st.text) ELSE qs.statement_end_offset
     END - qs.statement_start_offset) / 2 + 1) AS StmtText,
  query_hash, query_plan_hash
FROM
(
  SELECT 
    c.value('xs:hexBinary(substring((@QueryHash)[1],3))', 'varbinary(max)') AS QueryHash,
    c.value('xs:hexBinary(substring((@QueryPlanHash)[1],3))', 'varbinary(max)') AS QueryPlanHash,
    c.value('(QueryPlan/@CompileTime)[1]', 'int') AS CompileTime_ms,
    c.value('(QueryPlan/@CompileCPU)[1]', 'int') AS CompileCPU_ms,
    c.value('(QueryPlan/@CompileMemory)[1]', 'int') AS CompileMemory_KB,
    qp.query_plan
FROM sys.dm_exec_cached_plans AS cp
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS qp
CROSS APPLY qp.query_plan.nodes('ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS n(c)
) AS tab
JOIN sys.dm_exec_query_stats AS qs ON tab.QueryHash = qs.query_hash
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
ORDER BY CompileMemory_KB DESC
OPTION (RECOMPILE, MAXDOP 1);

You can see how the plan cache is being used with the following:

SELECT objtype, cacheobjtype,
    AVG(size_in_bytes*1.0)/1024.0/1024.0,
    MAX(size_in_bytes)/1024.0/1024.0,
    SUM(size_in_bytes)/1024.0/1024.0,
    COUNT(*)
FROM sys.dm_exec_cached_plans
GROUP BY GROUPING SETS ((),(objtype, cacheobjtype))
ORDER BY objtype, cacheobjtype;

When you are experiencing high semaphore waits, check to see if these query results vary significantly from during "normal" activity:

SELECT resource_semaphore_id, -- 0 = regular, 1 = "small query"
  pool_id,
  available_memory_kb,
  total_memory_kb,
  target_memory_kb
FROM sys.dm_exec_query_resource_semaphores;

SELECT StmtText = SUBSTRING(st.[text], (qs.statement_start_offset / 2) + 1,
        (CASE qs.statement_end_offset
          WHEN -1 THEN DATALENGTH(st.text) ELSE qs.statement_end_offset
         END - qs.statement_start_offset) / 2 + 1),
  r.start_time, r.[status], DB_NAME(r.database_id), r.wait_type, 
  r.last_wait_type, r.total_elapsed_time, r.granted_query_memory,
  m.requested_memory_kb, m.granted_memory_kb, m.required_memory_kb,
  m.used_memory_kb
FROM sys.dm_exec_requests AS r
INNER JOIN sys.dm_exec_query_stats AS qs
ON r.plan_handle = qs.plan_handle
INNER JOIN sys.dm_exec_query_memory_grants AS m
ON r.request_id = m.request_id
AND r.plan_handle = m.plan_handle
CROSS APPLY sys.dm_exec_sql_text(r.plan_handle) AS st;

And you may also want to look and see how memory is distributed:

DBCC MEMORYSTATUS;

And there is some good information here about why you might be seeing a high number of compiles/recompiles (which will contribute to that wait):

http://technet.microsoft.com/en-us/library/ee343986(v=sql.100).aspx

http://technet.microsoft.com/en-us/library/cc293620.aspx

You can check for high compile/recompile counts using the following counters:

SELECT counter_name, cntr_value
  FROM sys.dm_os_performance_counters
  WHERE counter_name IN 
  (
    'SQL Compilations/sec',
    'SQL Re-Compilations/sec'
  );

And you can check for internal memory pressure leading to evictions - non-zero counters here would indicate that something not good is going on with the plan cache:

SELECT * FROM sys.dm_os_memory_cache_clock_hands 
  WHERE [type] IN (N'CACHESTORE_SQLCP', N'CACHESTORE_OBJCP');

NOTE Most of these metrics don't have a magic "oh my gosh I need to panic or do something!" threshold. What you need to do is to take measurements during normal system activity, and determine where these thresholds are for your hardware, configuration and workload. When you ~~panic~~ do something is when two conditions are true:

the metrics vary significantly from normal values; and,
there is actually a performance problem occurring (like your CPU spikes) - but only if they are actually interfering with anything. Other than seeing the CPUs spike, are you seeing any other symptom? In other words, is the spike the symptom, or is the spike causing other symptoms? Would users of the system ever notice? A lot of people always go after their highest wait consumer, simply because it's the highest. Something is always going to be the highest wait consumer - you have to know that it's varying enough from normal activity that it indicates a problem or some significant change.

Optimize for ad hoc workloads is a great setting for 99% of the workloads out there, but it will not be very helpful in reducing compilation costs - it is aimed at reducing plan cache bloat by preventing a single-use plan from storing the whole plan until it's been executed twice. Even when you only store the stub in the plan cache, you still have to compile the full plan for the execution of the query. Perhaps what @Kahn meant to recommend was setting the database level parameterization to forced, which will potentially provide better plan re-use (but it really depends on how unique all of these high-cost queries are).

Also some good information in this white paper about plan caching and compilation.

Best Answer

Related Solutions

Sql-server – Attempting to tune a Sharepoint site using SQL Server Waits and Queues

Sql-server – Intermittent RESOURCE_SEMAPHORE_QUERY_COMPILE Wait Stats

Related Question