Sql-server – Affinity side effects / TF 8002- how to measure

configurationquery-performancesql serversql-server-2016trace-flags

Environment: We have Microsoft SQL Server 2016 (SP2-CU15) Standard Edition (64-bit) on Windows Server 2016 Standard 10.0.
We have multiple SE instances on one physical server.
We use processor affinity mask to tell which instance use what cores. Some cores are dedicated to instance, but most cores are shared between more than one instance. (Without using affinity mask we would not be able to use all physical machine power)

Problem: we have noticed that one instance is disturbing another one in case there is some heavy CPU load is running. Especially if that is one (!) big query running for long period of time. There always seems to be enough CPU power left available to affected instance not to experience visible slowdowns.
I do believe that trace flag 8002 would help in this situation as described in this link.

What is the the wait stats (or other metrics) that I should monitor to ensure that trace flag really gives desired positive effect?

Best Answer

Since these instances are SQL Server Standard Edition, this will have to be without the functioning or performance counters provided by the SQL Server Resource Governor, which is an Enterprise Edition feature.

I recommend capturing the [\Process(sqlservr*)% Processor Time] perfmon counters (on a scale of logical processors * 100%), and the [\Processor Information(*)% Processor Time] counters. You'll want to note which SQL Server process is affnitized to which logical processors, since sharing is happening. Within the SQL Server instances, I recommend capturing wait_time_ms for SOS_SCHEDULER_YIELD type, as well as signal_wait_time_ms across all wait types. SOS_SCHEDULER_YIELD is the wait that is incurred when a task has used up its quantum on the scheduler and must yield, waiting for its next turn to get on the scheduler and run again. You'll notice that the signal_wait_time_ms for SOS_SCHEDULER_YIELD is almost if not equal to wait_time_ms. That's because once it's yielded that task/worker really is ready to get back on the scheduler as soon as possible. And the signal_wait_time_ms across all wait types (including SOS_SCHEDULER_YIELD) indicates how much time is spent ready to run but just waiting for a turn after the wait resources are available. The hope is that under a similar workload, trace flag 8002 and the "opportunistic" binding of schedulers to underlying logical CPUs will result in increased rate of work, higher CPU utilization, and decreased times for SOS_SCHEDULER_YIELD and signal wait time across all waits.