Sql-server – Troubleshooting SOS_SCHEDULER_YIELD wait

performancesql serversql-server-2012wait-types

Running our corporate ERP (Dynamics AX 2012), I noticed our production environment seemed much slower than our development systems.

After performing the same activities in both the development and production environments while running a trace, I confirmed that SQL queries were executing very slowly on our
production environment compared to development (10-50x slower on average).

At first I attributed this to load, and re-ran the same activities on the production environment during off hours and found the same results in the trace.

I cleared my wait stats in SQL Server then let the server run under its normal production load for a little while, and then ran this query:

WITH [Waits] AS
    (SELECT
        [wait_type],
        [wait_time_ms] / 1000.0 AS [WaitS],
        ([wait_time_ms] - [signal_wait_time_ms]) / 1000.0 AS [ResourceS],
        [signal_wait_time_ms] / 1000.0 AS [SignalS],
        [waiting_tasks_count] AS [WaitCount],
        100.0 * [wait_time_ms] / SUM ([wait_time_ms]) OVER() AS [Percentage],
        ROW_NUMBER() OVER(ORDER BY [wait_time_ms] DESC) AS [RowNum]
    FROM sys.dm_os_wait_stats
    WHERE [wait_type] NOT IN (
        N'CLR_SEMAPHORE',    N'LAZYWRITER_SLEEP',
        N'RESOURCE_QUEUE',   N'SQLTRACE_BUFFER_FLUSH',
        N'SLEEP_TASK',       N'SLEEP_SYSTEMTASK',
        N'WAITFOR',          N'HADR_FILESTREAM_IOMGR_IOCOMPLETION',
        N'CHECKPOINT_QUEUE', N'REQUEST_FOR_DEADLOCK_SEARCH',
        N'XE_TIMER_EVENT',   N'XE_DISPATCHER_JOIN',
        N'LOGMGR_QUEUE',     N'FT_IFTS_SCHEDULER_IDLE_WAIT',
        N'BROKER_TASK_STOP', N'CLR_MANUAL_EVENT',
        N'CLR_AUTO_EVENT',   N'DISPATCHER_QUEUE_SEMAPHORE',
        N'TRACEWRITE',       N'XE_DISPATCHER_WAIT',
        N'BROKER_TO_FLUSH',  N'BROKER_EVENTHANDLER',
        N'FT_IFTSHC_MUTEX',  N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
        N'DIRTY_PAGE_POLL',  N'SP_SERVER_DIAGNOSTICS_SLEEP')
    )
SELECT
    [W1].[wait_type] AS [WaitType],
    CAST ([W1].[WaitS] AS DECIMAL(14, 2)) AS [Wait_S],
    CAST ([W1].[ResourceS] AS DECIMAL(14, 2)) AS [Resource_S],
    CAST ([W1].[SignalS] AS DECIMAL(14, 2)) AS [Signal_S],
    [W1].[WaitCount] AS [WaitCount],
    CAST ([W1].[Percentage] AS DECIMAL(4, 2)) AS [Percentage],
    CAST (([W1].[WaitS] / [W1].[WaitCount]) AS DECIMAL (14, 4)) AS [AvgWait_S],
    CAST (([W1].[ResourceS] / [W1].[WaitCount]) AS DECIMAL (14, 4)) AS [AvgRes_S],
    CAST (([W1].[SignalS] / [W1].[WaitCount]) AS DECIMAL (14, 4)) AS [AvgSig_S]
FROM [Waits] AS [W1] INNER JOIN [Waits] AS [W2] ON [W2].[RowNum] <= [W1].[RowNum]
GROUP BY [W1].[RowNum], [W1].[wait_type], [W1].[WaitS],
    [W1].[ResourceS], [W1].[SignalS], [W1].[WaitCount], [W1].[Percentage]
HAVING SUM ([W2].[Percentage]) - [W1].[Percentage] < 95; -- percentage threshold

My results are as follows:

WaitType               Wait_S  Resource_S  Signal_S  WaitCount  Percentage  AvgWait_S  AvgRes_S  AvgSig_S
SOS_SCHEDULER_YIELD   4162.52        3.64   4158.88    4450085       77.33     0.0009    0.0000    0.0009
ASYNC_NETWORK_IO       457.98      331.59    126.39     351113        8.51     0.0013    0.0009    0.0004
PAGELATCH_EX           252.94        5.14    247.80     796348        4.70     0.0003    0.0000    0.0003
WRITELOG               166.01       48.01    118.00     302209        3.08     0.0005    0.0002    0.0004
LCK_M_U                145.47      145.45      0.02        123        2.70     1.1827    1.1825    0.0002

So seemingly the largest Wait is SOS_Scheduler_Yield by far, and I googled around and found it typically relates to the CPU not being able to keep up.

I then ran this query multiple times in succession.

SELECT *
FROM sys.dm_os_schedulers
WHERE scheduler_id < 255

I know I'm supposed to be looking for schedulers with non-zero runnable_tasks_count or pending_disk_io_count, but it's basically zero almost all the time.

I should also mention that Max Degree of Parallelism was set to 1, since the Dynamics AX workload is typically OLTP in nature, and changing it 8 did not make much of difference in the above wait stats, they became almost the exact same with the same performance problems.

I'm sort of at a loss of where to go from here, I basically have a SQL Server that is seemingly CPU strapped but not waiting on runnable_tasks or IO.

I do know that the IO subsystem of this SQL Server isn't very good, because running SQLIO on the drive containing the actual databases can lead to pretty low numbers (think 10MB a sec for certain types of reads/write), that said, it doesn't seem like SQL is waiting on that because of the amount of memory on the server caching most of the databases.

Here is some environment information to help:

Production environment:

SQL Server
HP ProLian DL360p Gen8
Intel Xeon E5-2650 0 @ 2.00GHz x 2 with hyperthreading (32 logical cores)
184GB memory
Windows Server 2012
2 instances of SQL Server 2012 Standard (RTM, unpatched)
Raid 1 279GB drives (15k) C: drive, contains databases and operating system
Page File and TempDB on distinct, separate drives (solid state)

My DEV:

Hyper-V hosted SQL Server and Dynamics AX 2012 AOS server
Core i7 3.4ghz with hyperthreading (8 logical cores)
8GB of memory
Windows Server 2008 R2
SSD for the entire VM.

I would welcome any input on other things to look for.

Best Answer

So I resolved this, turns out that power management features were enabled on our SQL server that were scaling the CPU frequency up and down, but not fast enough to keep up with the small demand and introduced the SOS_Scheduler_Yield wait. After changing it to run always in high performance the issue went away and now the waits are more normal (LatchIO type stuff).

Related Solutions

Sql-server – SQL Server statements intermittently slow on SQL Server 2008 R2

Thank you for the detailed explanation of your problem (one of the best laid out questions actually).

WRITELOG is a very common type of wait, so don't worry about it. Looking at the SOS_SCHEDULER_YIELD indicate CPU pressure and also the CXPACKET, it is possible that there must be some missing indexes and you may be retrieving lot of data from the queries for an OLTP system. I suggest you to look at the Missing Indexes DMV and see if there are any indexes (almost sure there will be more than few) that are in the questionable procs.

http://sqlfool.com/2009/04/a-look-at-missing-indexes/

http://troubleshootingsql.com/2009/12/30/how-to-find-out-the-missing-indexes-on-a-sql-server-2008-or-2005-instance-along-with-the-create-index-commands/

Look for Jonathan Kehayias's post on sqlblog.com on this too.

Also, take a look at Parameter sniffing.

http://sommarskog.se/query-plan-mysteries.html

http://pratchev.blogspot.com/2007/08/parameter-sniffing.html

It's NOT a compete answer for your needs but a good starting point. Let us know if you need more details.

Sql-server – SQL Server performance: PREEMPTIVE_OS_DELETESECURITYCONTEXT dominant wait type

I know this question, based on the Title, is mainly concerned with the PREEMPTIVE_OS_DELETESECURITYCONTEXT wait type, but I believe that is a misdirection of the true issue which is " a customer who was complaining about high CPU usage on their SQL Server ".

The reason I believe that focusing on this specific wait type is a wild goose chase is because it goes up for every connection made. I am running the following query on my laptop (meaning I am the only user):

SELECT * 
FROM sys.dm_os_wait_stats
WHERE wait_type = N'PREEMPTIVE_OS_DELETESECURITYCONTEXT'

And then I do any of the following and re-run this query:

open a new query tab
close the new query tab
run the following from a DOS prompt: SQLCMD -E -Q "select 1"

Now, we know that CPU is high so we should look at what is running to see what sessions have high CPU:

SELECT req.session_id AS [SPID],
       req.blocking_session_id AS [BlockedBy],
       req.logical_reads AS [LogReads],
       DB_NAME(req.database_id) AS [DatabaseName],
       SUBSTRING(txt.[text],
                 (req.statement_start_offset / 2) + 1,
                 CASE
                     WHEN req.statement_end_offset > 0
                        THEN (req.statement_end_offset - req.statement_start_offset) / 2
                     ELSE LEN(txt.[text])
                 END
                ) AS [CurrentStatement],
       txt.[text] AS [CurrentBatch],
       CONVERT(XML, qplan.query_plan) AS [StatementQueryPlan],
       OBJECT_NAME(qplan.objectid, qplan.[dbid]) AS [ObjectName],
       sess.[program_name],
       sess.[host_name],
       sess.nt_user_name,
       sess.total_scheduled_time,
       sess.memory_usage,
       req.*
FROM sys.dm_exec_requests req
INNER JOIN sys.dm_exec_sessions sess
        ON sess.session_id = req.session_id
CROSS APPLY sys.dm_exec_sql_text(req.[sql_handle]) txt
OUTER APPLY sys.dm_exec_text_query_plan(req.plan_handle,
                                        req.statement_start_offset,
                                        req.statement_end_offset) qplan
WHERE req.session_id <> @@SPID
ORDER BY req.logical_reads DESC, req.cpu_time DESC
--ORDER BY req.cpu_time DESC, req.logical_reads DESC

I usually run the above query as it is, but you could also switch which ORDER BY clause is commented out to see if that gives more interesting / helpful results.

Alternatively you can run the following, based on dm_exec_query_stats, to find highest-cost queries. The first query below will show you individual queries (even if they have multiple plans) and is ordered by Average CPU Time, but you can easily change that to be Average Logical Reads. Once you find a query that looks like it is taking a lot of resources, copy the "sql_handle" and "statement_start_offset" into the WHERE condition of the second query below to see the individual plans (can be more than 1). Scroll to the far right and assuming there was an XML Plan, it will display as a link (in Grid Mode) which will take you to the plan viewer if you click on it.

Query #1: Get Query Info

;WITH cte AS
(
   SELECT qstat.[sql_handle],
          qstat.statement_start_offset,
          qstat.statement_end_offset,
          COUNT(*) AS [NumberOfPlans],
          SUM(qstat.execution_count) AS [TotalExecutions],

          SUM(qstat.total_worker_time) AS [TotalCPU],
          (SUM(qstat.total_worker_time * 1.0) / SUM(qstat.execution_count)) AS [AvgCPUtime],
          MAX(qstat.max_worker_time) AS [MaxCPU],

          SUM(qstat.total_logical_reads) AS [TotalLogicalReads],
   (SUM(qstat.total_logical_reads * 1.0) / SUM(qstat.execution_count)) AS [AvgLogicalReads],
          MAX(qstat.max_logical_reads) AS [MaxLogicalReads],

          SUM(qstat.total_rows) AS [TotalRows],
          (SUM(qstat.total_rows * 1.0) / SUM(qstat.execution_count)) AS [AvgRows],
          MAX(qstat.max_rows) AS [MaxRows]
   FROM sys.dm_exec_query_stats  qstat
   GROUP BY qstat.[sql_handle], qstat.statement_start_offset, qstat.statement_end_offset
)
SELECT  cte.*,
        DB_NAME(txt.[dbid]) AS [DatabaseName],
        SUBSTRING(txt.[text],
                  (cte.statement_start_offset / 2) + 1,
                  CASE
                      WHEN cte.statement_end_offset > 0
                          THEN (cte.statement_end_offset - cte.statement_start_offset) / 2
                      ELSE LEN(txt.[text])
                  END
                 ) AS [CurrentStatement],
        txt.[text] AS [CurrentBatch]
FROM cte
CROSS APPLY sys.dm_exec_sql_text(cte.[sql_handle]) txt
ORDER BY cte.AvgCPUtime DESC

Query #2: Get Plan Info

SELECT  *,
        DB_NAME(qplan.[dbid]) AS [DatabaseName],
        CONVERT(XML, qplan.query_plan) AS [StatementQueryPlan],
        SUBSTRING(txt.[text],
                  (qstat.statement_start_offset / 2) + 1,
                  CASE
                        WHEN qstat.statement_end_offset > 0
                        THEN (qstat.statement_end_offset - qstat.statement_start_offset) / 2
                        ELSE LEN(txt.[text])
                  END
                 ) AS [CurrentStatement],
        txt.[text] AS [CurrentBatch]
FROM sys.dm_exec_query_stats  qstat
CROSS APPLY sys.dm_exec_sql_text(qstat.[sql_handle]) txt
OUTER APPLY sys.dm_exec_text_query_plan(qstat.plan_handle,
                                        qstat.statement_start_offset,
                                        qstat.statement_end_offset) qplan
-- paste info from Query #1 below
WHERE qstat.[sql_handle] = 0x020000001C70C614D261C85875D4EF3C90BD18D02D62453800....
AND qstat.statement_start_offset = 164
-- paste info from Query #1 above
ORDER BY qstat.total_worker_time DESC

Best Answer

Related Solutions

Sql-server – SQL Server statements intermittently slow on SQL Server 2008 R2

Sql-server – SQL Server performance: PREEMPTIVE_OS_DELETESECURITYCONTEXT dominant wait type

Related Question