Linux – Discrepancy between reported used memory and sum of application memory usage

linuxmemorymemory leaksprocessvirtual-memory

I'm running a desktop system that quite regularly suffers from lack of memory, this prompted me to investigate what causes the issue in the first place.

Problem is, there's no single process that eats the memory, yet the system doesn't show it as available. What's more, the system does swap so it looks like the memory pressure is real. What's puzzling, is that the usage goes to normal (~1GB used) after I log out and back again so it looks like some weird interaction between userland and kernel and not a memory leak.

In short:

memory reported as used by free, excluding cache/buffers: 3173960 kB
sum of USS of all applications: 2413952 kB
SLAB size: 158968 kB
zram (after compression): 75992 kB

That gives, 3173960-2413952-158968-75992 = 525048 kB unaccounted memory usage.

What I'm missing or not counting?

Sum of applications memory usage:

# smem -t | sed -n '1p;$p'
  PID User     Command                         Swap      USS      PSS      RSS 
  108 6                                      244524  2413952  2461340  2648488

Memory usage as reported by free:

# free -k
             total       used       free     shared    buffers     cached
Mem:       4051956    3449748     602208          0      26548     249240
-/+ buffers/cache:    3173960     877996
Swap:      4051952     242592    3809360

General memory statistic:

# cat /proc/meminfo 
MemTotal:        4051956 kB
MemFree:          612260 kB
Buffers:           26636 kB
Cached:           249304 kB
SwapCached:       107892 kB
Active:          1774004 kB
Inactive:         885268 kB
Active(anon):    1712484 kB
Inactive(anon):   710788 kB
Active(file):      61520 kB
Inactive(file):   174480 kB
Unevictable:        9332 kB
Mlocked:            9332 kB
SwapTotal:       4051952 kB
SwapFree:        3809368 kB
Dirty:                40 kB
Writeback:             0 kB
AnonPages:       2343292 kB
Mapped:            95288 kB
Shmem:             36396 kB
Slab:             158968 kB
SReclaimable:      53900 kB
SUnreclaim:       105068 kB
KernelStack:        3528 kB
PageTables:        43600 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     6077928 kB
Committed_AS:    4013288 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      139852 kB
VmallocChunk:   34359570976 kB
HardwareCorrupted:     0 kB
AnonHugePages:    641024 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:     2310848 kB
DirectMap2M:     1882112 kB

Swaps are on zram:

# cat /proc/swaps 
Filename                                Type            Size    Used    Priority
/dev/zram0                              partition       2025976 121252  100
/dev/zram1                              partition       2025976 121324  100

# awk ' { print $0 / 1024; sum+=$0 } END { print "sum:" sum/1024 } ' /sys/block/zram*/compr_data_size
37962.4
38030.1
sum:75992.5

Best Answer

The problem

4 GB of RAM (physical memory) and that you have 2 zram device of maximum 2,025,976 kB (roughly 2 GB each). zram is using the available memory, I do not know exactly the internal but whatever the mechanism I can clearly imagine a scenario where Linux page out (= put some memory from the RAM to zram) to get some more free space but then the zram usage in memory is growing, so it would further page out, which would result in further increase of zram usage, and so on until zram is consuming all your physical memory.

I guess there is a threshold on any system under which the paging out won't stress the kernel to the point I describe above, so that zram improve performance.

Insights

When your system wants to swap 100 MB, what happens is that it puts this 100 MB in zram. Let's say it gets compressed to 50% less, so 50 MB. It means that your system wanted to free 100 MB but only 50 MB got freed. Now Linux is clever in that when it has paged out chunk of memory (so put them in the swap) but need them again, it can do some "optimisation", it can page in again this memory but keep it in the swap as well, so if quickly after it would need to page out these part of the memory it could avoid an expenive write to the swap file. So in your case, it could be that Linux keeps the 100 MB in zram and put them back in normal RAM, so the system consumes 150 MB for awhile. If this is repeated for bigger program with less compressible data, this could quickly become a nightmare, imagine a 300 MB chunk of RAM that would be paged out, and use 120 MB in each zram swap. It means that Linux wanted to free 300 MB of the RAM for other purpose, but has only freed (300-120-120=60) 60 MB, it might then try to page out further pages, and so on, with the problem that you have 2 zram that can use up to 2 GB of RAM each, thus eating all your memory.

Conclusion and solution

So is zram crap? No, not at all, the problem is that you configured zram to have a total size of exactly your physical RAM and that's the problem. You should not configure zram to use more than 25% IMHO of your physical RAM, which means you would have to rely still in a hard disk swap solution once zram swap is filled up.

A simple solution would be to reduce both zram to handle each 500 MB max and add a swap file of roughly 2-3 GB, to allow the kernel to free really unused pages from zram to this swap file. The swap file won't use the RAM and dimish the pressure on it.

Some information on how to set your zram disk size.

Further investigations

Following my gut feeling that zram was behind this behavious, I setted up a VM with similar spec as your machine: 4 GB RAM and 2 GB zram swap, no swap file.

I have loaded the VM with heavy weight applications and got the following state:

huygens@ubuntu:~$ smem -wt -K ~/vmlinuz-3.2.0-38-generic.unpacked -R 4096M
Area                           Used      Cache   Noncache 
firmware/hardware            130717          0     130717 
kernel image                  13951          0      13951 
kernel dynamic memory       1063520     922172     141348 
userspace memory            2534684     257136    2277548 
free memory                  451432     451432          0 
----------------------------------------------------------
                            4194304    1630740    2563564 
huygens@ubuntu:~$ free -m
             total       used       free     shared    buffers     cached
Mem:          3954       3528        426          0         79        858
-/+ buffers/cache:       2589       1365
Swap:         1977          0       1977

As you can see free reports 858 MB cache memory and that is also what smem seems to report within the cached kernel dynamic memory.

Then I further stressed the system using Chromium Browser. At the beginning, it was only have 83 MB of swap used. But then after a few more tabs opened, the swap switch quickly to almost it's maximum and I experienced OOM! zram has really a dangerous side where wrongly configured (too big sizes) it can quickly hit you back like a trebuchet-like mechanism.

At that time I had the following outputs:

huygens@ubuntu:~$ smem -wt -K ~/vmlinuz-3.2.0-38-generic.unpacked -R 4096M
Area                           Used      Cache   Noncache 
firmware/hardware            130717          0     130717 
kernel image                  13951          0      13951 
kernel dynamic memory       1355344     124072    1231272 
userspace memory             961004      36456     924548 
free memory                 1733288    1733288          0 
----------------------------------------------------------
                            4194304    1893816    2300488 
huygens@ubuntu:~$ free -m
             total       used       free     shared    buffers     cached
Mem:          3954       2256       1698          0          4        132
-/+ buffers/cache:       2118       1835
Swap:         1977       1750        227

See how the kernel dynamic memory (columns cache and non-cache) look like inverted? It is because in the first case, the kernel had "cached" memory such as reported by free but then it had swap memory held by zram which smem does not know how to compute (check smem source code, zram occupation is not reported in /proc/meminfo, this it is not computed by smem which does simple "total kernel mem" - "type of memory reported by meminfo that I know are cache", what it does not know is that in the computed total kernel mem it has added the size of the swap which is in RAM!)

When I was in this state, I activated a hard disk swap and turned off the zram swap and I reset the zram devices: echo 1 > /sys/block/zram0/reset.

After that the noncache kernel memory melted like snow in summer and returned to "normal" value.

Conclusion

smem does not know about zram (yet) maybe because it is still staging and thus not part of /proc/meminfo which reports global parameters (like (in)active pages size, total memory) and then only report on a few specific parameters. smem identified a few of this specific parameters as "cache", sum them up and compare that to total memory. Because of that zram used memory gets counted in the noncache column.

Note: by the way, in modern kernel, meminfo reports also the shared memory consumed. smem does not take that yet into account, so even without zram the output of smem is to consider carefully esp. if you use application that make big use of shared memory.

References used:

Best Answer

The problem

Insights

Conclusion and solution

Related Solutions

Linux – Tracking down “missing” memory usage in linux

Linux – “kernel dynamic memory” as reported by smem

Further investigations

Conclusion

Related Question