Linux – Tracking down Linux memory usage when not showing up in cache

linuxmemorymemory leakstop

Before you get your pitchforks out, I am having trouble tracking down on where the memory is going into the caching system in Linux. I have seen Linux ate my RAM!, and How to see top processes by actual memory usage?, and Correctly determining memory usage in Linux but using those as examples, the numbers don't quite add up with the numbers I'm seeing on my system.

For this system I understand that it is probably "cached", or used by the programs, but the numbers aren't even coming close to adding up for me.

In top I see.

top - 09:04:09 up 19 days, 20:38,  2 users,  load average: 0.00, 0.10, 0.11
Tasks: 160 total,   1 running, 159 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.1%us,  0.2%sy,  0.0%ni, 99.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  65974296k total, 43507804k used, 22466492k free,   305336k buffers
Swap:  7548924k total,        0k used,  7548924k free,  1480836k cached

I get it, 43GB of ram "used" isn't really true, probably most of it is probably cached. So digging into it I ran:

$ free -m
             total       used       free     shared    buffers     cached
Mem:         64428      38845      25582          0        298       1445
-/+ buffers/cache:      37101      27326
Swap:         7371          0       7371

So this shows that 37GB of it is indeed used and only 1445MB is cached (this is where I would have expected to see 1445 being something like 20000). The websites I linked above show that the "cached" column is usually pretty high.
So digging further I went and did the following to check applications that were using memory.

    sudo smem -t
      PID User     Command                         Swap      USS      PSS      RSS
9468
    21475 root     python /usr/bin/smem -t            0     6212     6234     6984
     2431 root     /opt/OV/lbin/perf/coda             0     7156     8060    12068
     2213 root     /opt/perf/bin/perfd                0    19056    19485    22032
    20849 root     /opt/shiny-server/ext/node/        0    77244    77321    78616
    21325 atpa     /usr/lib64/R/bin/exec/R --n        0  3729836  3733774  3739520
    21287 atpa     /usr/lib64/R/bin/exec/R --n        0  4060136  4064074  4069820
    -------------------------------------------------------------------------------
       63 11     

                                     0  7947984  7970344  8054032

So two applications of R are using ~8GB of memory.

The other articles I linked above show that Linux was "reserving" the memory and keeping it in cache (e.g. free -m showed cache was high value on the "Mem:" line), whereas in my case it seemed to be actually in use but no applications actually seem to be reporting the usage of the memory and I can't seem to track down where Linux is using (caching/reserving?) the memory.

Where is this memory going? I am assuming Linux is using it, but I can't determine where it is being utilized.

/proc/meminfo shows

MemTotal:       65974296 kB
MemFree:        24191624 kB
Buffers:          305320 kB
Cached:          1480760 kB
SwapCached:            0 kB
Active:          7769776 kB
Inactive:         215532 kB
Active(anon):    6199392 kB
Inactive(anon):      476 kB
Active(file):    1570384 kB
Inactive(file):   215056 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       7548924 kB
SwapFree:        7548924 kB
Dirty:               116 kB
Writeback:             0 kB
AnonPages:       6172696 kB
Mapped:            47400 kB
Shmem:               652 kB
Slab:             255468 kB
SReclaimable:     225620 kB
SUnreclaim:        29848 kB
KernelStack:        1736 kB
PageTables:        18780 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    40536072 kB
Committed_AS:    6455352 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      247288 kB
VmallocChunk:   34359487760 kB
HardwareCorrupted:     0 kB
AnonHugePages:   2586624 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       10240 kB
DirectMap2M:    67098624 kB

Best Answer

I think I found my issue...

My problem seems to have been VMware's memory ballooning system. Basically this is a way for the host system to apply memory pressure to the guest OS, consuming the guests memory allocation when other hosts start using a significant amount of memory.

http://www.vfrank.org/2013/09/18/understanding-vmware-ballooning/

If you are using VMware, run the command

vmware-toolbox-cmd stat balloon

This will show the amount of ballooned memory.

For me

#:vmware-toolbox-cmd stat balloon
32425 MB

Other sources: https://serverfault.com/questions/660080/detect-memory-ballooning-from-within-the-affected-vm

Turn off ballooned memory to validate the issue

Unballooning memory: https://serverfault.com/questions/528295/unballooning-ram-thats-been-ballooned-by-vmware