Sql-server – NUMA Nodes – MAXDOP – PLE

maxdopmemorynumasql serversql-server-2005

We have a server with 8 CPUS across 2 NUMAs with hyperthreading enabled. Currently Maxdop is set to 8, but actually should be set to 4 as per the Maxdop section of this article:

https://support.microsoft.com/en-us/kb/322385

So we need to change it to 4.

However my question is what is the impact of having maxdop set to 8? So going parallel across two NUMAS ? The reason I ask is that we just had a strange issue where queries were very slow to return and PLE dropped off rapidly.

Even when nothing was running against SQL PLE didn't not improve. CXPACKET wait types went up. Then all of a sudden CXPACKET wait type dropped completely and PLE began to rise and now has returned to normal.

Throughout this time small queries were being executed against a database, however it was not the case that one query finished causing the CXPACKET wait type to drop off and PLE to rise again – we don't know what causes that.

A possible explanation is the incorrect MAXDOP setting.

Can anyone explain to me the impact of parallel executions across NUMA nodes, is it just the same as exhausting worker threads and slower access time when using foreign memory ?

Thank you

Best Answer

SQL SERVER 2005 SP3

Your best bet first is to upgrade it to a supported SP - First Download SP4 then Download CU3 --> Build NO: 9.00.5266. Support for SQL server 2005 is ending next year (2016/04/12) even for the latest SP and CU - so better upgrade to a 2012/2014.

You can use my script to help you suggest a good MAXDOP value.

Can anyone explain to me the impact of parallel executions across NUMA nodes, is it just the same as exhausting worker threads and slower access time when using foreign memory ?

SQL server is NUMA aware. This means that SQL server knows what NUMA node the processors are in and what NUMA node the memory is in. Due to this, SQL server will assign worker thread on correct NUMA node for the data that you want to access.

Refer to :

Below are some queries that will help you find out if there is an imbalance of schedulers between the NUMA nodes - which can lead to significant performance problems under load :

SELECT 
  parent_node_id,
  scheduler_id, 
  [cpu_id], 
  is_idle, 
  current_tasks_count, 
  runnable_tasks_count, 
  active_workers_count, 
  load_factor
FROM sys.dm_os_schedulers
WHERE [status] = N'VISIBLE ONLINE';

--- check the current tasks, runnable tasks, active workers and avg load factor COUNT ...
SELECT 
  parent_node_id, 
  SUM(current_tasks_count) AS current_tasks_count, 
  SUM(runnable_tasks_count) AS runnable_tasks_count, 
  SUM(active_workers_count) AS active_workers_count, 
  AVG(load_factor) AS avg_load_factor
FROM sys.dm_os_schedulers
WHERE [status] = N'VISIBLE ONLINE'
GROUP BY parent_node_id;

You should monitor SQL Server:Buffer Node and SQL Server, Memory Node as opposed to just PLE - monitor the individual Buffer Node:Page life expectancy counters (there will be one Buffer Node performance object for each NUMA node).

exhausting worker threads

IF you set the MAXDOP setting as default (which is 0) then it could lead to worker thread starvation.