Linux – How to disable memory for a NUMA node on a Linux system

linuxmemorynumarhel

Is there a way to disable access to memory associated with a given NUMA node/socket on a NUMA machine?

We have a bit of controversy with the database vendor about our HP DL560 machines. The DB sales type’s technical support person was animated that we could not use our DL560s but had to buy new DL360s since they have fewer sockets. I believe their concern is the speed of accessing inter-socket memory. They recommended that if I insisted on keeping the DL560s, I should leave two of the sockets empty. I think they are mistaken (AKA crazy) but I need tests to demonstrate that I am on solid ground.

My configuration:
The machines have four sockets, each of which has 22 hyperthreaded physical cores, for a total of 176 apparent cores with a total of 1.5 T of memory.
The operating system is Red Hat Enterprise Linux Server release 7.4.

The lscpu display reads (in part):

$ lscpu | egrep 'NUMA|ore'
Thread(s) per core:    2
Core(s) per socket:    22
NUMA node(s):          4
NUMA node0 CPU(s):     0-21,88-109
NUMA node1 CPU(s):     22-43,110-131
NUMA node2 CPU(s):     44-65,132-153
NUMA node3 CPU(s):     66-87,154-175

If I had access to the physical hardware, I would consider pulling the processors from two of the sockets to prove my point but I don’t have access and I don’t have permission to go monkeying around with the hardware anyway.

The next best thing would be to virtually disable the sockets using the operating system. I read on this link that I can take a processor out of service with

echo 0 > /sys/devices/system/cpu/cpu3/online

and, indeed, the processors the processors are out of service but that says nothing about the memory.

I just turned off all the processors for socket #3 with (using lscpu to find which are for Socket#3):

for num in {66..87} {154..175}
do
    echo 0 > /sys/devices/system/cpu/cpu${num}/online
    cat /sys/devices/system/cpu/cpu${num}/online
done

and got:

$ grep N3 /proc/$$/numa_maps
7fe5daa79000 default file=/usr/lib64/libm-2.17.so mapped=16 mapmax=19 N3=16 kernelpagesize_kB=4

Which, if I am reading this correctly, shows my current process is using memory in socket #3. Except the shell was already running when I turned off the processors.

I started a new process that does its best to gobble up memory and

$ cat /proc/18824/numa_maps | grep N3

Returns no records initially but After gobbling up memory for a long time, it starts using memory on Node 3.

I tried running my program with numactl and binding to nodes 0,1,2 and it works as expected … except I don’t have control over the vendor's software and there is no provision in Linux to set another process as is done with the set_mempolicy service as used by numactl.

Short of physically removing the processors, Is there a way to force the issue?

Best Answer

I believe their concern is the speed of accessing inter-socket memory. They recommended that if I insisted on keeping the DL560s, I should leave two of the sockets empty.

this would have to do with the number of QPI or UPI links and the Intel Scalability (because you mentioned Xeon) between n CPU's whether it is 4S, S4S, S8S. But the fact that there is 4 sockets, means you should be able to access RAM anywhere to a reasonable degree of speediness (S4S would be faster than 4S), but at this level worst case it would be orders of magnitude faster than accessing disk or some other kind of PCIe storage.

for a given process, running on some specific core on CPU 0, 1, 2, or 3 in a quad socket system, then the fastest RAM access is that pool of RAM chips hanging off that given CPU's memory controller. If it has to hop over a QPI/UPI link to some other cpu to then get to that RAM it would be slower and not optimal. But you have to weigh that all against not having enough shared RAM in the first place.

yes there is a way to force the issue, and it's with

cpuset - confine processes to processor and memory node subsets

The cpuset filesystem is a pseudo-filesystem interface to the kernel cpuset mechanism, which is used to control the processor placement and memory placement of processes. It is commonly mounted at /dev/cpuset.

Related Question