Sql-server – SQL Server instance running out of worker threads

sharepointsql serversql-server-2008-r2

I have a SQL Server 2008 R2 SP3 Standard Edition 64-bit instance on 8 cores (576 max worker threads) with 32 GB RAM (MaxMem = 28000). It is the data store for a SharePoint installation, with 218 databases.

It was getting dozens of "SQL Server failed with error code 0xc0000000 to spawn a thread to process a new login or connection." errors per day, but no other errors. I found that MAXDOP = 0, which is bad for SharePoint. I gradually (over weeks) brought MAXDOP down to 1. As I did so, the frequency of those errors went down to zero on most days. But I still see them once in a while.

sys.dm_os_wait_stats has this to say about THREADPOOL waits:

waiting_tasks_count wait_time_ms    max_wait_time_ms    signal_wait_time_ms
26149               474516          4428                9

The server was last restarted at Mar 25 2018 5:55PM and current server time is Apr 20 2018 10:07PM. sp_Blitz finds nothing interesting other than use of join and order hints and slow storage writes on the drives holding tempdb files.

This is on a VM in a private cloud. Increasing the number of CPUs will be very expensive, and while it is heavily used, CPU usage doesn't seem to be a problem. In this case, would increasing the max worker threads be a reasonable thing to try, should I just leave it alone and live with the occasional 17189 error, or is there another option?

Best Answer

Increasing MWT can lead to increased SOS_SCHEDULER_YIELD waits. Not the end of the world, but think of it like adding a buncha kids to a teacher's class. All of a sudden, it's gonna be harder for each kid to get attention.

When a process exhausts its 4ms quantum, there will potentially be more threads ahead of it waiting to get on the CPU. It's really difficult to say if performance will be worse with that trade off.

You could try increasing MWT a incrementally.

This is how SQL Server manages it:

NUTS

Notice how doubling core counts doesn't double MWT, and you get the same number with 1 core as you do with 4 cores?

It's like a sign, or something.

The equation is: 512 + ((logical CPUs - 4) * 16), which means at 10 cores you'd have 608 threads, and at 12 cores you'd have 640 threads.

Those are reasonably safe increments to try, but I wouldn't go over that without a support call with Microsoft.

Hope this helps!