Note that all servers use the same NICs - Broadcom 5709Cs with
up-to-date drivers. The servers themselves are Dell R610's.
Kyle Brandt has an opinion on Broadcom network cards which echoes my own (repeated) experience.
Broadcom, Die Mutha
My problems have always been associated with TCP Offload features and in 99% of cases disabling or switching to a-n-other network card has resolved the symptoms. One client that (as in your case) uses Dell servers, always orders separate Intel NICs and disables the on-board Broadcom cards on build.
As described in this MSDN blog post, I would start with disabling in the OS with:
netsh int ip set chimney DISABLED
IIRC it may be necessary to disable the features at the card driver level in some circumstances, it certainly won't hurt to do so.
This is an awesome post.
To answer your final question, I'd speculate that your answer is "yes".
That said, I probably would have pursued soft numa before resorting to the trace flags. I think you are right about the numa node allocation and that's could be at the root of your problem. Via soft numa, you could scale out the requests, depending on your count of numa nodes (4?) - to 4, if that's the correct number, and then assign, via ip address, each host to a specific numa node, in addition to that, I'd disable hyper threading. Combined, the issue would likely decrease, however, it would do so at the cost of fewer schedulers.
On a seperate thought, I'd look at forced parameterization - the fact that your load is driving your CPU so high is very interesting and it may be worth looking into that.
Lastly, on multi-numa node systems, I typically have the output of the following queries dumping to a table every N seconds. Makes for some interesting analysis when workload changes or trace flags are implemented:
SELECT getdate() as poll_time, node_id, node_state_desc, memory_node_id, online_scheduler_count, active_worker_count, avg_load_balance, idle_scheduler_count
FROM sys.dm_os_nodes WITH (NOLOCK)
WHERE node_state_desc <> N'ONLINE DAC'
and
SELECT top 10 getdate() as sample_poll, wait_type, count (*)
FROM sys.dm_os_waiting_tasks
WHERE [wait_type] NOT IN
('CLR_SEMAPHORE','LAZYWRITER_SLEEP','RESOURCE_QUEUE','SLEEP_TASK','SLEEP_SYSTEMTASK',
'SQLTRACE_BUFFER_FLUSH','WAITFOR', 'BROKER_TASK_STOP',
'BROKER_RECEIVE_WAITFOR', 'OLEDB','CLR_MANUAL_EVENT', 'CLR_AUTO_EVENT' )
GROUP BY wait_type
ORDER BY COUNT (*) DESC
Best Answer
The sp_WhoIsActive documentation says:
However, I need to point out a few things in your question:
In your case, I'd start by troubleshooting the lead blocker. Don't worry about the queries waiting to compile - they haven't even started yet. In your scenario, the last thing you should be doing is starting even more queries, which are likely to just get blocked anyway and consume more workspace memory while they wait. Get the lead blocker out of the way instead.