Linux – Difference Between tmpfs, shm, and Hugepages

filesystemslinuxmemoryshared memorytmpfs

I've been curious lately about the various Linux kernel memory based filesystems.

Note: As far as I'm concerned, the questions below should be considered more or less optional when compared with a better understanding of that posed in the title. I ask them below because I believe answering them can better help me to understand the differences, but as my understanding is admittedly limited, it follows that others may know better. I am prepared to accept any answer that enriches my understanding of the differences between the three filesystems mentioned in the title.

Ultimately I think I'd like to mount a usable filesystem with hugepages, though some light research (and still lighter tinkering) has led me to believe that a rewritable hugepage mount is not an option. Am I mistaken? What are the mechanics at play here?

Also regarding hugepages:

     uname -a
3.13.3-1-MANJARO \
#1 SMP PREEMPT \
x86_64 GNU/Linux

    tail -n8 /proc/meminfo
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:     8223772 kB
DirectMap2M:    16924672 kB
DirectMap1G:     2097152 kB

(Here are full-text versions of /proc/meminfo and /proc/cpuinfo )

What's going on in the above? Am I already allocating hugepages? Is there a difference between DirectMap memory pages and hugepages?

Update After a bit of a nudge from @Gilles, I've added 4 more lines above and it seems there must be a difference, though I'd never heard of DirectMap before pulling that tail yesterday… maybe DMI or something?

Only a little more…

Failing any success with the hugepages endeavor, and assuming harddisk backups of any image files, what are the risks of mounting loops from tmpfs? Is my filesystem being swapped the worst-case scenario? I understand tmpfs is mounted filesystem cache – can my mounted loopfile be pressured out of memory? Are there mitigating actions I can take to avoid this?

Last – exactly what is shm, anyway? How does it differ from or include either hugepages or tmpfs?

Best Answer

There is no difference betweem tmpfs and shm. tmpfs is the new name for shm. shm stands for SHaredMemory.

See: Linux tmpfs.

The main reason tmpfs is even used today is this comment in my /etc/fstab on my gentoo box. BTW Chromium won't build with the line missing:

# glibc 2.2 and above expects tmpfs to be mounted at /dev/shm for 
# POSIX shared memory (shm_open, shm_unlink). 
shm                     /dev/shm        tmpfs           nodev,nosuid,noexec     0 0

which came out of the linux kernel documentation

Quoting:

tmpfs has the following uses:

1) There is always a kernel internal mount which you will not see at
all. This is used for shared anonymous mappings and SYSV shared
memory.

This mount does not depend on CONFIG_TMPFS. If CONFIG_TMPFS is not set, the user visible part of tmpfs is not build. But the internal
mechanisms are always present.

2) glibc 2.2 and above expects tmpfs to be mounted at /dev/shm for
POSIX shared memory (shm_open, shm_unlink). Adding the following
line to /etc/fstab should take care of this:

tmpfs /dev/shm tmpfs defaults 0 0

Remember to create the directory that you intend to mount tmpfs on if necessary.

This mount is not needed for SYSV shared memory. The internal
mount is used for that. (In the 2.3 kernel versions it was
necessary to mount the predecessor of tmpfs (shm fs) to use SYSV
shared memory)

3) Some people (including me) find it very convenient to mount it
e.g. on /tmp and /var/tmp and have a big swap partition. And now
loop mounts of tmpfs files do work, so mkinitrd shipped by most
distributions should succeed with a tmpfs /tmp.

4) And probably a lot more I do not know about :-)

tmpfs has three mount options for sizing:

size: The limit of allocated bytes for this tmpfs instance. The default is half of your physical RAM without swap. If you oversize your tmpfs instances the machine will deadlock since the OOM handler will not be able to free that memory.
nr_blocks: The same as size, but in blocks of PAGE_CACHE_SIZE.
nr_inodes: The maximum number of inodes for this instance. The default is half of the number of your physical RAM pages, or (on a machine with highmem) the number of lowmem RAM pages, whichever is the lower.

From the Transparent Hugepage Kernel Doc:

Transparent Hugepage Support maximizes the usefulness of free memory if compared to the reservation approach of hugetlbfs by allowing all unused memory to be used as cache or other movable (or even unmovable entities). It doesn't require reservation to prevent hugepage allocation failures to be noticeable from userland. It allows paging and all other advanced VM features to be available on the hugepages. It requires no modifications for applications to take advantage of it.

Applications however can be further optimized to take advantage of this feature, like for example they've been optimized before to avoid a flood of mmap system calls for every malloc(4k). Optimizing userland is by far not mandatory and khugepaged already can take care of long lived page allocations even for hugepage unaware applications that deals with large amounts of memory.

New Comment after doing some calculations:

HugePage Size: 2MB
HugePages Used: None/Off, as evidenced by the all 0's, but enabled as per the 2Mb above.
DirectMap4k: 8.03Gb
DirectMap2M: 16.5Gb
DirectMap1G: 2Gb

Using the Paragraph above regarding Optimization in THS, it looks as tho 8Gb of your memory is being used by applications that operate using mallocs of 4k, 16.5Gb, has been requested by applications using mallocs of 2M. The applications using mallocs of 2M are mimicking HugePage Support by offloading the 2M sections to the kernel. This is the preferred method, because once the malloc is released by the kernel, the memory is released to the system, whereas mounting tmpfs using hugepage wouldn't result in a full cleaning until the system was rebooted. Lastly, the easy one, you had 2 programs open/running that requested a malloc of 1Gb

For those of you reading that don't know a malloc is a Standard Structure in C that stands for Memory ALLOCation. These calculations serve as proof that the OP's correlation between DirectMapping and THS maybe correct. Also note that mounting a HUGEPAGE ONLY fs would only result in a gain in Increments of 2MB, whereas letting the system manage memory using THS occurs mostly in 4k blocks, meaning in terms of memory management every malloc call saves the system 2044k(2048 - 4) for some other process to use.

Further investigations

Following my gut feeling that zram was behind this behavious, I setted up a VM with similar spec as your machine: 4 GB RAM and 2 GB zram swap, no swap file.

I have loaded the VM with heavy weight applications and got the following state:

huygens@ubuntu:~$ smem -wt -K ~/vmlinuz-3.2.0-38-generic.unpacked -R 4096M
Area                           Used      Cache   Noncache 
firmware/hardware            130717          0     130717 
kernel image                  13951          0      13951 
kernel dynamic memory       1063520     922172     141348 
userspace memory            2534684     257136    2277548 
free memory                  451432     451432          0 
----------------------------------------------------------
                            4194304    1630740    2563564 
huygens@ubuntu:~$ free -m
             total       used       free     shared    buffers     cached
Mem:          3954       3528        426          0         79        858
-/+ buffers/cache:       2589       1365
Swap:         1977          0       1977

As you can see free reports 858 MB cache memory and that is also what smem seems to report within the cached kernel dynamic memory.

Then I further stressed the system using Chromium Browser. At the beginning, it was only have 83 MB of swap used. But then after a few more tabs opened, the swap switch quickly to almost it's maximum and I experienced OOM! zram has really a dangerous side where wrongly configured (too big sizes) it can quickly hit you back like a trebuchet-like mechanism.

At that time I had the following outputs:

huygens@ubuntu:~$ smem -wt -K ~/vmlinuz-3.2.0-38-generic.unpacked -R 4096M
Area                           Used      Cache   Noncache 
firmware/hardware            130717          0     130717 
kernel image                  13951          0      13951 
kernel dynamic memory       1355344     124072    1231272 
userspace memory             961004      36456     924548 
free memory                 1733288    1733288          0 
----------------------------------------------------------
                            4194304    1893816    2300488 
huygens@ubuntu:~$ free -m
             total       used       free     shared    buffers     cached
Mem:          3954       2256       1698          0          4        132
-/+ buffers/cache:       2118       1835
Swap:         1977       1750        227

See how the kernel dynamic memory (columns cache and non-cache) look like inverted? It is because in the first case, the kernel had "cached" memory such as reported by free but then it had swap memory held by zram which smem does not know how to compute (check smem source code, zram occupation is not reported in /proc/meminfo, this it is not computed by smem which does simple "total kernel mem" - "type of memory reported by meminfo that I know are cache", what it does not know is that in the computed total kernel mem it has added the size of the swap which is in RAM!)

When I was in this state, I activated a hard disk swap and turned off the zram swap and I reset the zram devices: echo 1 > /sys/block/zram0/reset.

After that the noncache kernel memory melted like snow in summer and returned to "normal" value.

Conclusion

smem does not know about zram (yet) maybe because it is still staging and thus not part of /proc/meminfo which reports global parameters (like (in)active pages size, total memory) and then only report on a few specific parameters. smem identified a few of this specific parameters as "cache", sum them up and compare that to total memory. Because of that zram used memory gets counted in the noncache column.

Note: by the way, in modern kernel, meminfo reports also the shared memory consumed. smem does not take that yet into account, so even without zram the output of smem is to consider carefully esp. if you use application that make big use of shared memory.

References used:

Best Answer

Related Solutions

What are the differences in shared memory between early and modern Unix systems

Linux – “kernel dynamic memory” as reported by smem

Further investigations

Conclusion

Related Question