Sql-server – How to troubleshoot how resource governed query is being throttled in SQL 2012

resource-governorsql serversql-server-2012

I have a parallelized query on SQL 2012 SP1 (ent) that is running in a user defined resource governor pool much much slower (2 seconds versus 125 to 140 seconds under user defined pool = ~75 times slower) than the same query that runs in the default resource pool.

The query plan hash is identical between the two executions and statistics io returns same values for all operations. I've confirmed that execution plan does not have additional spill warnings (both have one), and that the memory grant for the queries are identical.

User defined pool workload group

<QueryPlan DegreeOfParallelism="4" MemoryGrant="2384096" CachedPlanSize="384" CompileTime="3608" CompileCPU="2712" CompileMemory="15672">
        <ThreadStat Branches="5" UsedThreads="20">
          <ThreadReservation NodeId="0" ReservedThreads="20" />
        </ThreadStat>
        <MemoryGrantInfo SerialRequiredMemory="5632" SerialDesiredMemory="2360768" RequiredMemory="28960" DesiredMemory="2384096" RequestedMemory="2384096" GrantWaitTime="0" GrantedMemory="2384096" MaxUsedMemory="93872" />
        <OptimizerHardwareDependentProperties EstimatedAvailableMemoryGrant="209715" EstimatedPagesCached="104857" EstimatedAvailableDegreeOfParallelism="4" />

Default pool

<QueryPlan DegreeOfParallelism="4" MemoryGrant="2384096" CachedPlanSize="384" CompileTime="3016" CompileCPU="2660" CompileMemory="15672">
            <ThreadStat Branches="5" UsedThreads="20">
              <ThreadReservation NodeId="0" ReservedThreads="20" />
            </ThreadStat>
            <MemoryGrantInfo SerialRequiredMemory="5632" SerialDesiredMemory="2360768" RequiredMemory="28960" DesiredMemory="2384096" RequestedMemory="2384096" GrantWaitTime="0" GrantedMemory="2384096" MaxUsedMemory="88768" />
            <OptimizerHardwareDependentProperties EstimatedAvailableMemoryGrant="209715" EstimatedPagesCached="104857" EstimatedAvailableDegreeOfParallelism="4" />

Total_worker_time is nearly identical between the queries. Only thing that seems different is the total_elapsed_time. Wait is cxpacket. At this point, I must think this is something that resource governor is doing to throttle the CPU and/or memory. I can't find any evidence of the throttling though. CPU throttled counter does not budge, neither does memory suboptimal plans/sec.

For the workload group I've played with increasing memory grant beyond default (25%), and have played with max dop settings. Nothing makes a difference to execution time unless I make the query execute serially. If I change maxdop of workload group to 1 (serial execution) the query runs in ~3 seconds once in cache (1 second slower than default group). All other levels of maxdop result in ~70 and 75 times slower).

Server info:
2 socket x 6 core server. Server MaxDop = 4. CPU is relatively at rest (~12% utilization) when I execute the tests. Memory on server is not under pressure (~6GB free on 24GB box).

Resource pool configuration

100% max memory

50% CPU (increasing to 100% didn't make any difference)

75% CPU CAP

Workload group

MaxDop 0 (and have tried 0,1,4,7)

Memory Grant (25% – and have tried SQL recommended maximum of 70%)

Any ideas on how to narrow down what is doing the throttling?
I do not see anything obvious from the Microsoft link for troubleshooting it

http://technet.microsoft.com/en-us/library/cc627395(v=sql.105).aspx

Full plan for Default Pool (quick) execution – http://pastebin.com/y0P9J61f

Full plan for User Pool (slow) execution – http://pastebin.com/20hnFSTW

The problem query is second statement in the plans (first one is inconsequential and doesn't run parallel anyways…)

Best Answer

An update - I still don't know how best to detect what type of throttling has gone on, but I did find the culprit in my case. It does seem that the CPU CAP to 75% was causing a much higher decrease in performance than a 25% drop should suggest. Raising the CPU Cap to 100% fixed the problem.

A connect article seems to validate this might be a problem with the CPU Cap implementation itself.

Its not the best answer to this question, and if someone still wants to answer the original "how to tell what throttling is going on..." question. I'd be amenable.

If you are struggling to understand why your queries are being throttled so severely, this might be a solution worth trying for it.

http://connect.microsoft.com/SQLServer/feedback/details/781335/resource-governor-feature-cap-cpu-percent-does-not-work-as-expected