SQL Server 2008 R2 – Understanding Parallel Query Execution Errors

parallelismsql serversql-server-2008-r2

Today we experienced a degradation in performance on our production sql server. Durring the time this occurred we logged several "The query processor could not start the necessary thread resources for parallel query execution" errors. The reading that I've done suggests that this has to do with how many CPUs to use when executing a complex query. However when I checked during the outage our CPU Utilization was only at 7%. Is there something else this could be referring too that I haven't come across yet? Is this a likely culprit of the performance degradation or am I chasing a red herring?

My sp_configure values for this are as follows:

name                                minimum maximum config_value run_value
cost threshold for parallelism      0       32767   5            5

Best Answer

Few months ago, I faced similar situation wherein the MAXDOP setting was default and a run away query exhausted all the worker threads.

As Remus pointed out this is called worker thread starvation.

There will be a memory dump created on your server when this condition occured.

If you are on 2008R2+SP1 and up then sys.dm_server_memory_dumps will give you the dump file location as well.

Now back to the problem :

There is 1 scheduler monitor thread per NUMA node and since you have 2 NUMA nodes there will be 2 scheduler monitor threads which are responsible for health checking of all the schedulers every 60 secs for that particular NUMA node while making sure that the scheduler is stuck or not.

Each time a new work request is pulled from the schedulers worker queue, the work processes counter is incremented. So if the scheduler has work request queued and has not processes one of the work requests in 60 sec the scheduler is consider stuck.

Due to a run-away query or extensive parallelism, there arises a condition of worker threads begin exhausted as all the threads are occupied by that single run-away query or excessive prolonged blocking and no work can be done unless that offending process gets killed.

Your best bet is to first tune your Max Degree of Parallelism setting. Default of 0 means SQL Server can use all available CPU’s for parallel processing and there by exhausting all the worker threads.

There are many reasons that can lead to exhaustion of worker threads :

  • Extensive long blocking chains causing SQL Server to run out of worker threads
  • Extensive parallelism also leading to exhaustion of worker threads
  • Extensive wait for any type of "lock" - spinlocks, latches. An orphaned spinlock is an example.

Refer to my answer here that will show you how you can calculate MAXDOP value for your server instance.

Also, highly recommend you to start collecting Wait stats information about your database server instance.