Linux: Total swap used = swap used by processes +

linuxmemoryswap

So, I'm trying to do some investigation on where does swap use come from in a system with high swap usage:

# free
             total       used       free     shared    buffers     cached
Mem:        515324     508800       6524          0       4852      27576
-/+ buffers/cache:     476372      38952
Swap:       983032     503328     479704

Adding up swap used per process:

# for proc in /proc/*; do cat $proc/smaps 2>/dev/null | awk '/Swap/{swap+=$2}END{print swap "\t'`readlink $proc/exe`'"}'; done | sort -n | awk '{total+=$1}/[0-9]/;END{print total "\tTotal"}'
0       /bin/gawk
0       /bin/sort
0       /usr/bin/readlink
28      /sbin/xxxxxxxx
52      /sbin/mingetty
52      /sbin/mingetty
52      /sbin/mingetty
52      /sbin/mingetty
56      /sbin/mingetty
56      /sbin/mingetty
60      /xxxxxxxxxxx
60      /usr/sbin/xxx
84      /usr/sbin/xxx
108     /usr/bin/xxx
168     /bin/bash
220     /sbin/init
256     /sbin/rsyslogd
352     /bin/bash
356     /bin/bash
360     /usr/sbin/sshd
496     /usr/sbin/crond
672     /usr/sbin/sshd
12972   /opt/jdk1.6.0_22/bin/java
80392   /usr/libexec/mysqld
311876  /opt/jdk1.6.0_22/bin/java
408780  Total

Which gives a lower value for total used swap. Where is the remaining used swapspace? Is it vmalloc()'ed memory inside the kernel? Something else? How can I identify it?

Output of meminfo:

# cat /proc/meminfo 
MemTotal:       515324 kB
MemFree:          6696 kB
Buffers:          5084 kB
Cached:          28056 kB
SwapCached:     157512 kB
Active:         429372 kB
Inactive:        65068 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       515324 kB
LowFree:          6696 kB
SwapTotal:      983032 kB
SwapFree:       478712 kB
Dirty:             100 kB
Writeback:           0 kB
AnonPages:      399456 kB
Mapped:           8792 kB
Slab:             7744 kB
PageTables:       1820 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   1240692 kB
Committed_AS:  1743904 kB
VmallocTotal:   507896 kB
VmallocUsed:      3088 kB
VmallocChunk:   504288 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     4096 kB

Best Answer

The difference you are observing isn't actually due to swap space being unaccounted for. The "(deleted)" that the kernel sometimes appends to /proc/*/exe links is output by readlink and is causing parse errors in your awk script, and you are effectively not counting processes whose binaries are no longer present in your total.

Some kernels append the word "(deleted)" to /proc/*/exe symlink targets when the original executable for the process is no longer around.

The reason your command is showing less than the total is because of this. The output of readlink on such links will be something like "/path/to/bin (deleted)", which causes a parse error in awk when the output is substituted back into the string (it doesn't like the parentheses and spaces). For example, do this:

for a in /proc/*/exe ; do readlink $a ; done | grep deleted

And you will see a few entries with "(deleted)" appended. If you looked at the swap usage for these entries, their total would match the discrepancy you see, as the resulting awk errors prevent their totals from being calculated and included in the final total.

If you run your original command without redirecting stderr anywhere, you will probably notice a few "runaway string constant" errors. Those errors are a result of the above and you should not have ignored them.

Ignoring other potential improvements to your original command, you could modify it by removing the " (deleted)", like this (note |awk '{print $1}' added to readlink output):

for proc in /proc/*; \
  do cat $proc/smaps 2>/dev/null | awk '/Swap/{swap+=$2}END{print swap "\t'`readlink $proc/exe|awk '{print $1}' `'" }'; \
done | sort -n | awk '{total+=$1}/[0-9]/;END{print total "\tTotal"}'

This use of awk to fix the output of readlink may break if the name contains spaces -- you can use sed or whatever method you prefer.

Bonus Info

By the way, you could just use smem -t. The "Swap" column displays what you want.

As for calculating it yourself, though, you can also get this information more directly from the VmSwap field in /proc/*/status (smaps requires some kernel support and isn't always available), and avoid having to redirect error output by using a proper filename pattern that avoids the errors to begin with:

for proc in /proc/[0-9]*; do \
  awk '/VmSwap/ { print $2 "\t'`readlink $proc/exe | awk '{ print $1 }'`'" }' $proc/status; \
done | sort -n | awk '{ total += $1 ; print $0 } END { print total "\tTotal" }'

If you don't need the actual binary and can deal with just having the process name, you can get everything from status:

for a in /proc/*/status ; do \
  awk '/VmSwap|Name/ { printf $2 " " } END { print "" }' $a ; \
done | awk '{ total+=$2 ; print $0 } END { print "Total " total }'

And finally, if just having the PIDs suffices, you can just do it all with awk:

awk '/VmSwap/ { total += $2; print $2 "\t" FILENAME } END { print total "\tTotal" }' /proc/*/status

Note:

Now this isn't to say that there aren't differences between free and smem (the latter being the same as your script). There are plenty (see, for example, https://www.google.com/search?q=smem+free, which has more than enough results on the first page to answer your questions about memory usage). But without a proper test, your specific situation cannot be addressed.

Further investigations

Following my gut feeling that zram was behind this behavious, I setted up a VM with similar spec as your machine: 4 GB RAM and 2 GB zram swap, no swap file.

I have loaded the VM with heavy weight applications and got the following state:

huygens@ubuntu:~$ smem -wt -K ~/vmlinuz-3.2.0-38-generic.unpacked -R 4096M
Area                           Used      Cache   Noncache 
firmware/hardware            130717          0     130717 
kernel image                  13951          0      13951 
kernel dynamic memory       1063520     922172     141348 
userspace memory            2534684     257136    2277548 
free memory                  451432     451432          0 
----------------------------------------------------------
                            4194304    1630740    2563564 
huygens@ubuntu:~$ free -m
             total       used       free     shared    buffers     cached
Mem:          3954       3528        426          0         79        858
-/+ buffers/cache:       2589       1365
Swap:         1977          0       1977

As you can see free reports 858 MB cache memory and that is also what smem seems to report within the cached kernel dynamic memory.

Then I further stressed the system using Chromium Browser. At the beginning, it was only have 83 MB of swap used. But then after a few more tabs opened, the swap switch quickly to almost it's maximum and I experienced OOM! zram has really a dangerous side where wrongly configured (too big sizes) it can quickly hit you back like a trebuchet-like mechanism.

At that time I had the following outputs:

huygens@ubuntu:~$ smem -wt -K ~/vmlinuz-3.2.0-38-generic.unpacked -R 4096M
Area                           Used      Cache   Noncache 
firmware/hardware            130717          0     130717 
kernel image                  13951          0      13951 
kernel dynamic memory       1355344     124072    1231272 
userspace memory             961004      36456     924548 
free memory                 1733288    1733288          0 
----------------------------------------------------------
                            4194304    1893816    2300488 
huygens@ubuntu:~$ free -m
             total       used       free     shared    buffers     cached
Mem:          3954       2256       1698          0          4        132
-/+ buffers/cache:       2118       1835
Swap:         1977       1750        227

See how the kernel dynamic memory (columns cache and non-cache) look like inverted? It is because in the first case, the kernel had "cached" memory such as reported by free but then it had swap memory held by zram which smem does not know how to compute (check smem source code, zram occupation is not reported in /proc/meminfo, this it is not computed by smem which does simple "total kernel mem" - "type of memory reported by meminfo that I know are cache", what it does not know is that in the computed total kernel mem it has added the size of the swap which is in RAM!)

When I was in this state, I activated a hard disk swap and turned off the zram swap and I reset the zram devices: echo 1 > /sys/block/zram0/reset.

After that the noncache kernel memory melted like snow in summer and returned to "normal" value.

Conclusion

smem does not know about zram (yet) maybe because it is still staging and thus not part of /proc/meminfo which reports global parameters (like (in)active pages size, total memory) and then only report on a few specific parameters. smem identified a few of this specific parameters as "cache", sum them up and compare that to total memory. Because of that zram used memory gets counted in the noncache column.

Note: by the way, in modern kernel, meminfo reports also the shared memory consumed. smem does not take that yet into account, so even without zram the output of smem is to consider carefully esp. if you use application that make big use of shared memory.

References used:

Linux – How to measure total amount of memory used by userspace processes in Linux

Using smem to show a total of all user memory, no swap, and not counting any shared memory twice:

sudo smem -c pss -t | tail -1

Output on my system:

Unrolling that:

-c pss selects the column, in this case PSS. From man smem:

      smem reports physical memory usage, taking shared memory  pages
      into  account.   Unshared memory is reported as the USS (Unique
      Set Size).  Shared memory is divided evenly among the processes
      sharing   that  memory.   The  unshared  memory  (USS)  plus  a
      process's proportion of shared memory is reported  as  the  PSS
      (Proportional Set Size).  The USS and PSS only include physical
      memory usage.  They do not include memory that has been swapped
      out to disk.

-t shows a total or sum of all PSS used at the end, and tail -1 nips off the preceding data.

To show just the total unshared user memory, replace -c pss with -c uss:

sudo smem -c uss -t | tail -1

Output:

Note the above PSS total is more or less the same number as shown in row #5, column #2 here:

smem -w

Output:

Area                           Used      Cache   Noncache 
firmware/hardware                 0          0          0 
kernel image                      0          0          0 
kernel dynamic memory       1367712    1115708     252004 
userspace memory            4112112     419884    3692228 
free memory                  570060     570060          0

Best Answer

Related Solutions

Linux – “kernel dynamic memory” as reported by smem

Further investigations

Conclusion

Linux – How to measure total amount of memory used by userspace processes in Linux

Related Question