Sql-server – NUMA Nodes – MAXDOP – PLE

maxdopmemorynumasql serversql-server-2005

We have a server with 8 CPUS across 2 NUMAs with hyperthreading enabled. Currently Maxdop is set to 8, but actually should be set to 4 as per the Maxdop section of this article:

https://support.microsoft.com/en-us/kb/322385

So we need to change it to 4.

However my question is what is the impact of having maxdop set to 8? So going parallel across two NUMAS ? The reason I ask is that we just had a strange issue where queries were very slow to return and PLE dropped off rapidly.

Even when nothing was running against SQL PLE didn't not improve. CXPACKET wait types went up. Then all of a sudden CXPACKET wait type dropped completely and PLE began to rise and now has returned to normal.

Throughout this time small queries were being executed against a database, however it was not the case that one query finished causing the CXPACKET wait type to drop off and PLE to rise again – we don't know what causes that.

A possible explanation is the incorrect MAXDOP setting.

Can anyone explain to me the impact of parallel executions across NUMA nodes, is it just the same as exhausting worker threads and slower access time when using foreign memory ?

Thank you

Best Answer

SQL SERVER 2005 SP3

Your best bet first is to upgrade it to a supported SP - First Download SP4 then Download CU3 --> Build NO: 9.00.5266. Support for SQL server 2005 is ending next year (2016/04/12) even for the latest SP and CU - so better upgrade to a 2012/2014.

You can use my script to help you suggest a good MAXDOP value.

Can anyone explain to me the impact of parallel executions across NUMA nodes, is it just the same as exhausting worker threads and slower access time when using foreign memory ?

SQL server is NUMA aware. This means that SQL server knows what NUMA node the processors are in and what NUMA node the memory is in. Due to this, SQL server will assign worker thread on correct NUMA node for the data that you want to access.

Refer to :

Below are some queries that will help you find out if there is an imbalance of schedulers between the NUMA nodes - which can lead to significant performance problems under load :

SELECT 
  parent_node_id,
  scheduler_id, 
  [cpu_id], 
  is_idle, 
  current_tasks_count, 
  runnable_tasks_count, 
  active_workers_count, 
  load_factor
FROM sys.dm_os_schedulers
WHERE [status] = N'VISIBLE ONLINE';

--- check the current tasks, runnable tasks, active workers and avg load factor COUNT ...
SELECT 
  parent_node_id, 
  SUM(current_tasks_count) AS current_tasks_count, 
  SUM(runnable_tasks_count) AS runnable_tasks_count, 
  SUM(active_workers_count) AS active_workers_count, 
  AVG(load_factor) AS avg_load_factor
FROM sys.dm_os_schedulers
WHERE [status] = N'VISIBLE ONLINE'
GROUP BY parent_node_id;

You should monitor SQL Server:Buffer Node and SQL Server, Memory Node as opposed to just PLE - monitor the individual Buffer Node:Page life expectancy counters (there will be one Buffer Node performance object for each NUMA node).

exhausting worker threads

IF you set the MAXDOP setting as default (which is 0) then it could lead to worker thread starvation.

Related Solutions

Sql-server – SQL Server – Anyone use SUMA, trace flag 8048, or trace flag 8015

This is an awesome post.

To answer your final question, I'd speculate that your answer is "yes".

That said, I probably would have pursued soft numa before resorting to the trace flags. I think you are right about the numa node allocation and that's could be at the root of your problem. Via soft numa, you could scale out the requests, depending on your count of numa nodes (4?) - to 4, if that's the correct number, and then assign, via ip address, each host to a specific numa node, in addition to that, I'd disable hyper threading. Combined, the issue would likely decrease, however, it would do so at the cost of fewer schedulers.

On a seperate thought, I'd look at forced parameterization - the fact that your load is driving your CPU so high is very interesting and it may be worth looking into that.

Lastly, on multi-numa node systems, I typically have the output of the following queries dumping to a table every N seconds. Makes for some interesting analysis when workload changes or trace flags are implemented:

SELECT getdate() as poll_time, node_id, node_state_desc, memory_node_id, online_scheduler_count, active_worker_count, avg_load_balance, idle_scheduler_count
FROM sys.dm_os_nodes WITH (NOLOCK) 
WHERE node_state_desc <> N'ONLINE DAC'

and

SELECT top 10 getdate() as sample_poll, wait_type, count (*)
FROM sys.dm_os_waiting_tasks
WHERE [wait_type] NOT IN
('CLR_SEMAPHORE','LAZYWRITER_SLEEP','RESOURCE_QUEUE','SLEEP_TASK','SLEEP_SYSTEMTASK',
'SQLTRACE_BUFFER_FLUSH','WAITFOR', 'BROKER_TASK_STOP',
'BROKER_RECEIVE_WAITFOR', 'OLEDB','CLR_MANUAL_EVENT', 'CLR_AUTO_EVENT' ) 
GROUP BY wait_type
ORDER BY COUNT (*) DESC

Sql-server – Low PLE on NUMA node 000, high on 001

If you have a read-intensive query running on one NUMA node (in this case, 0), then it can experience a lower page life expectancy relative to other NUMA nodes.

That's totally normal.

To see what queries are running right now, you can use Adam Machanic's excellent sp_WhoIsActive. It's totally free. Some folks even run it every X minutes and log the data into a table so they can go back to see what was running at the time PLE nosedived.

Best Answer

Related Solutions

Sql-server – SQL Server – Anyone use SUMA, trace flag 8048, or trace flag 8015

Sql-server – Low PLE on NUMA node 000, high on 001

Related Question