There is a lot of misunderstanding about CXPACKET. CXPACKET isn't the cause of your problems, it is a side effect. What CXPACKET means when you see it is that that thread of a parallel query is waiting for another thread of that query to do something. Just because you see a lot of CXPACKET waits doesn't mean there's a problem with the query, it means that there is a problem somewhere else. When you see CXPACKET waits you need to look at the other threads of the SPID and see what other waits are besides CXPACKET. That other wait is the problem.
To your specific issue, the reason that the run times are so crazy is probably because the SQL Server is generating a different plan on some days because the statistics are out of date (making the job run long). Then you either manually update statistics (or via a job) or auto stats kicks in and then the plan gets better and the job runs quickly again.
Once you've solved the stats problem you can start looking at other reasons why the job is running slow. The fact that you only have a single disk is not helping any for sure.
This is an awesome post.
To answer your final question, I'd speculate that your answer is "yes".
That said, I probably would have pursued soft numa before resorting to the trace flags. I think you are right about the numa node allocation and that's could be at the root of your problem. Via soft numa, you could scale out the requests, depending on your count of numa nodes (4?) - to 4, if that's the correct number, and then assign, via ip address, each host to a specific numa node, in addition to that, I'd disable hyper threading. Combined, the issue would likely decrease, however, it would do so at the cost of fewer schedulers.
On a seperate thought, I'd look at forced parameterization - the fact that your load is driving your CPU so high is very interesting and it may be worth looking into that.
Lastly, on multi-numa node systems, I typically have the output of the following queries dumping to a table every N seconds. Makes for some interesting analysis when workload changes or trace flags are implemented:
SELECT getdate() as poll_time, node_id, node_state_desc, memory_node_id, online_scheduler_count, active_worker_count, avg_load_balance, idle_scheduler_count
FROM sys.dm_os_nodes WITH (NOLOCK)
WHERE node_state_desc <> N'ONLINE DAC'
and
SELECT top 10 getdate() as sample_poll, wait_type, count (*)
FROM sys.dm_os_waiting_tasks
WHERE [wait_type] NOT IN
('CLR_SEMAPHORE','LAZYWRITER_SLEEP','RESOURCE_QUEUE','SLEEP_TASK','SLEEP_SYSTEMTASK',
'SQLTRACE_BUFFER_FLUSH','WAITFOR', 'BROKER_TASK_STOP',
'BROKER_RECEIVE_WAITFOR', 'OLEDB','CLR_MANUAL_EVENT', 'CLR_AUTO_EVENT' )
GROUP BY wait_type
ORDER BY COUNT (*) DESC
Best Answer
Community Wiki answer generated from comments on the question by Aaron.
Unless you are having performance issues, the high percentage of
CXPACKET
waits may only be an indicator that a large portion of the queries are going parallel and not actually a problem.High CPU could be a indicator, but from what you have explained I would say look at queries with long durations but low CPU to start with. The
CXPACKET
wait is sometimes associated with a query waiting for all of the threads to finish before it can merge the results (data skew).If you are able to modify the queries and procedures, you could set a higher
MAXDOP
for the warehouse tasks that need it, and set the globalMAXDOP
lower. However, I would only do that as a last resort. You only really want to make those types of explicit hints when you've exhausted all of the possibilities or you can't make changes to the code, queries or database schemas.