Sql-server – Affinity side effects / TF 8002- how to measure

configurationquery-performancesql serversql-server-2016trace-flags

Environment: We have Microsoft SQL Server 2016 (SP2-CU15) Standard Edition (64-bit) on Windows Server 2016 Standard 10.0.
We have multiple SE instances on one physical server.
We use processor affinity mask to tell which instance use what cores. Some cores are dedicated to instance, but most cores are shared between more than one instance. (Without using affinity mask we would not be able to use all physical machine power)

Problem: we have noticed that one instance is disturbing another one in case there is some heavy CPU load is running. Especially if that is one (!) big query running for long period of time. There always seems to be enough CPU power left available to affected instance not to experience visible slowdowns.
I do believe that trace flag 8002 would help in this situation as described in this link.

What is the the wait stats (or other metrics) that I should monitor to ensure that trace flag really gives desired positive effect?

Best Answer

Since these instances are SQL Server Standard Edition, this will have to be without the functioning or performance counters provided by the SQL Server Resource Governor, which is an Enterprise Edition feature.

I recommend capturing the [\Process(sqlservr*)% Processor Time] perfmon counters (on a scale of logical processors * 100%), and the [\Processor Information(*)% Processor Time] counters. You'll want to note which SQL Server process is affnitized to which logical processors, since sharing is happening. Within the SQL Server instances, I recommend capturing wait_time_ms for SOS_SCHEDULER_YIELD type, as well as signal_wait_time_ms across all wait types. SOS_SCHEDULER_YIELD is the wait that is incurred when a task has used up its quantum on the scheduler and must yield, waiting for its next turn to get on the scheduler and run again. You'll notice that the signal_wait_time_ms for SOS_SCHEDULER_YIELD is almost if not equal to wait_time_ms. That's because once it's yielded that task/worker really is ready to get back on the scheduler as soon as possible. And the signal_wait_time_ms across all wait types (including SOS_SCHEDULER_YIELD) indicates how much time is spent ready to run but just waiting for a turn after the wait resources are available. The hope is that under a similar workload, trace flag 8002 and the "opportunistic" binding of schedulers to underlying logical CPUs will result in increased rate of work, higher CPU utilization, and decreased times for SOS_SCHEDULER_YIELD and signal wait time across all waits.

Related Solutions

Sql-server – SQL Server 2012: NUMA node shown as offline without special affinity settings

Found the answer: http://blogs.msdn.com/b/sql_shep/archive/2012/06/21/sql-server-2012-license-core-limitaion.aspx

My consultants installed the wrong version (CAL instead of core), so it's limited to 20 cores.

Sql-server – Should we really turn on trace flag 1118 for every SQL instance

For SQL Server 2000 → 2014

I generally agree with Paul Randal, that this is something you could always turn on, but I do have a little bit of hesitation to say it should always be on and you should leave it on regardless of what you observe (with or without adequate testing).

People can have poorly set up tempdb configurations, for example not enough files, files too small, files not equally sized, improper growth settings, instant file initialization not enabled, and bad coding symptoms where temp tables are not cached and reused (see here and here). In this scenario, with the perfect storm workload, it's conceivable that the trace flag could actually lead to worse overall performance. Think about a case where those settings are not optimal and you have tons of users all trying to create their own, not-from-cache copies of temp tables, and they're all using their own dedicated extents instead of sharing them. There may be some level, even if I can't produce it at will, where the overhead of dedicated extents outweighs the allocation contention savings.

Just because Paul hasn't seen a scenario where the scales tip at some threshold doesn't mean it's impossible, just that it's unlikely / uncommon.

You can likely avoid that by following all of the other best practices, because I suspect it really would require a perfect storm, I just do not feel strong enough to say always and without exception.

For SQL Server 2016

In SQL Server 2016, uniform extent allocation is the default behavior - see Bob Dorr's "it just runs faster" post about it. Other than that, many of the same things above hold true.

You can't turn this off for tempdb; so, effectively, trace flag 1118 is always on in 2016, and is permanent. So my suggestion would be to make sure tempdb works effectively under TF 1118 on your current instances and under your current workload, to avoid any surprises when you upgrade (at which point you will no longer have the option to turn it off).

If you want to turn it off for user databases in 2016, you use ALTER DATABASE:

ALTER DATABASE MyDatabase SET MIXED_PAGE_ALLOCATION ON;

Best Answer

Related Solutions

Sql-server – SQL Server 2012: NUMA node shown as offline without special affinity settings

Sql-server – Should we really turn on trace flag 1118 for every SQL instance

Related Question