SQL Server – Understanding MAXDOP Behavior for Parallel Queries

numaperformancequery-performancesql serversql-server-2012

For my current server as below

4 Sockets with 10 cores per socket and 20 logical processors per socket. 

Total 80 per SQL licensing: 4 NUMA nodes with HT enabled

I've set CTOP 50 and MAXDOP 8 on this DB servers with just one database of approx 15 TB which is Log Shipping stand by database ie Read only state

I recently changed the MAXDOP from 0 to 8 as i see lots of crapy queries running over 4-5 hours [not all but most of them] assuming were all over the CPU threads when checked their DOP was 64 all the time.

From above i assumed there could be excessive work being done by specific CPU cores when those heavy query starts coming, while some might still be idle. Not sure if this is how it can work.

SO now i've changed to MAXDOP 8 and see DOP for those queries showing as 8 fpr each individual queries ,say for 10 different queries running at this time.

How can i test/check if those queries are actually distributing the
work across all the cores evenly or is it still the case where all of
them might be doing all the work on one NUMA node socket with 20 cores
while remaining 60 are idle?

Yes there are performance issues with reports slow but its one of such case where i want to ensure first from my end if there is something that can be done at my server level while developers or people running the reports tune up their queries. Also this being ready only i cant change indexes or control better update to stats on LS restore occurring every 12 hours .

Best Answer

How can i test/check if those queries are actually distributing the work across all the cores evenly or is it still the case where all of them might be doing all the work on one NUMA node socket with 20 cores while remaining 60 are idle?

I would start by looking at overall CPU utilization on the server. If CPU is relatively high then you know that you don't have the problem that you described. In that situation, you might do one or more of the following to improve throughput:

  • Tune your queries to use less CPU
  • Run fewer queries
  • Get more or better hardware

If CPU is relatively low then you have a different type of CPU problem. In that situation, you might do one or more of the following to improve throughput:

  • Run more queries
  • Check wait stats to see if there's something stopping queries from running or using as much CPU as you'd like

As a rule of thumb, you should not expect that if you run 10 MAXDOP 8 queries then all 80 schedulers will end up with at least one parallel worker. The algorithm for allocating worker threads just isn't that smart. Bob Dorr provides some details here. In my experience with large servers I've found that issuing more queries than I would otherwise like was essential to get all of the schedulers to have a chance of having at least one parallel worker. Parallel worker threads are by default not bound to the same NUMA node. The situation that you're describing (only one busy NUMA node) can happen if your queries on other NUMA nodes finish first.

You may find some of the following links to be helpful if you want to look into the amount of work each CPU is performing: