Sql-server – Unable to understand correlation between MaxWorkers created and CPU utilization

blockingcpuquery-performancesql serversql server 2014

I am having a strange issue which i am unable to understand:

On my server with 64 Logical processors having maxworkers 1472, there are peak periods in day where workers created for an hour lets say goes beyond maxworkers available. Ranges somewhere 1800-1900 in that duration.

From what I can understand and saw is there a burst of blocking happening in that period. Lots of update commands blockings its own process while that UPDATE will block other inserts and selects over that range of couple of minutes.

However CPU does not tend to increase and stays steady at 30% on avg. Should the CPU not spike to 100% or atleast much higher than 30% when workers available are exhausted ? OR those waiting threads are not even on schedulers?

Please help me understand , thanks

Best Answer

Correlation between MaxWorkers created and CPU utilization

I strongly recommend you to read the Thread and Task Architecture Guide doc. I'll quote some relevant parts:

About the higher number of workers and the relation between maxworkers and CPU:

On a very busy SQL Server Database Engine, it's possible to see a number of active tasks that's over the limit set by reserved threads. These tasks can belong to a branch that is not being used anymore and are in a transient state, waiting for cleanup.

About the blocks you're experiencing:

If you suspect that there is a performance problem, it is probably not the availability of worker threads. The cause is more likely something like I/O that is causing the worker threads to wait.

About the CPU consumption not going beyond 30% while having requests blocking each other:

When a worker's quantum expires and is no longer active, the respective task is placed in a FIFO queue in a RUNNABLE state, until it moves to a RUNNING state again, assuming the task won't require access to resources that are not available at the moment, such as a latch or lock, in which case the task would be placed in a SUSPENDED state instead of RUNNABLE, until such time those resources are available.

And here's the list of states a worker can be (from sys.dm_os_workers). Unless it's on the RUNNING state, it's not using the CPU, therefore, you could have way more requests existing at the same time than the 1472 you expected to see if they're not in fact using the CPU.

Worker state. Can be one of the following values:

INIT = Worker is currently being initialized.

RUNNING = Worker is currently running either nonpreemptively or preemptively.

RUNNABLE = The worker is ready to run on the scheduler.

SUSPENDED = The worker is currently suspended, waiting for an event to send it a signal.

How to diagnostic your situation

With that in mind, you should check if you don't have a bottleneck caused by other resources (e.g., disk, network) required by the requests that are blocking the others. Paul Randal's article SQL Server Wait Statistics (or please tell me where it hurts…) has a terrific query to retrieve info about what could be causing trouble on your server. Before you assume the problem is the number of workers, I advise you to run that diagnostic on your server.