Sql-server – automatic soft NUMA results in odd number of cores

numasql serversql-server-2016

I ran sp_blitz on one of our SQL Server 2016 production servers and one of the findings was that there were CPUs with an odd number of cores. The detailed message indicated this is a really bad NUMA configuration.

SQL Server 2016 will use automatic soft NUMA if it detects physical processors with more than 8 cores. Our particular server has two sockets with 10 CPUs per socket.

The startup messages in the SQL Server error log indicate:

Automatic soft-NUMA was enabled because SQL Server has detected hardware NUMA nodes with greater than 8 physical cores.

As a result SQL Server created four logical NUMA nodes with 5 cores each.

Is this a performance problem?

We're using MAXDOP = 4.

Is parallelism an issue here?

Best Answer

We can't answer your question of "Is there a performance problem?" Only you can answer that. Are you getting acceptable performance in production? I can tell you that I'm aware of a production system that gets acceptable performance with four sockets of 10 CPUs each with auto soft-NUMA enabled, so it's not like it's impossible to get acceptable performance need with soft-NUMA nodes having an odd number of schedulers.

If you suspect that there are performance gains to be had by changing your NUMA configuration then you should consider testing that. As far as I know it really all depends on your workload. The three choices that you have are to keep auto soft-NUMA, disable auto soft-NUMA, or to attempt a manual configuration with de-affinitizing CPUs or manual soft-NUMA by changing the registry. I'm not sure if I have all of the details right below but you might find information about what changes within SQL Server as you change your configuration to be helpful.

The number of lazy writers is determined by the number of physical NUMA nodes, so soft-NUMA won't change it. You can have up to four transaction log writers in SQL Server 2016, with one per NUMA node within SQL server. Those transaction log writers won't be spread evenly over your NUMA nodes: they are assigned to CPUs 1, 2, 3, and 4 as needed. You will have one resource monitor per logical NUMA node. I believe that you will also have 1 I/O completion thread per logical NUMA node, but I haven't specifically looked at that and don't know if there's a limit.

Maybe going through a few example configurations will help. Suppose you disable auto soft-NUMA. For that configuration you will have 2 logical NUMA nodes of 10 schedulers in SQL Server, 2 resource monitors, 2 lazy writers, 2 I/O completion threads, and 2 transaction log writers. You would no longer see the sp_blitz finding.

Now suppose you enable auto soft-NUMA. For that configuration you will have 4 logical NUMA nodes of 5 schedulers in SQL Server, 4 resource monitors, 2 lazy writers, 4 I/O completion threads, and 4 transaction log writers.

With manual soft-NUMA you could do something like create 3 soft-NUMA nodes of 2 schedulers, 4 schedulers, and 4 schedulers per socket. That would also avoid the sp_blitz finding. You could also make 2 CPUs per socket not visible to SQL Server in which case you'd have 8 schedulers per socket and auto soft-NUMA would not apply. Both of those sound like rather odd configurations to me.

So which of those configurations is best suited for your workload? Would your workload benefit from having 4 transaction log writers instead of 2? Would your workload be harmed by having 4 resource monitors instead of 2? I have no idea. All you can do is benchmark it. I haven't been able to find any information from Microsoft as to when auto soft-NUMA could be harmful for performance. Their point of view seems to be that it generally improves performance and scalability and to leave it on unless you experience a performance problem. IMO sending MAXDOP 4 queries to 4 soft-NUMA nodes of 5 schedulers doesn't sound bad to me at all.