The "memory used by a process" is not a clear cut concept in modern operating systems. What can be measured is the size of the address space of the process (SIZE) and resident set size (RSS, how many of the pages in the address space are currently in memory). Part of RSS is shared (most processes in memory share one copy of glibc, and so for assorted other shared libraries; several processes running the same executable share it, processes forked share read-only data and possibly a chunk of not-yet-modified read-write data with the parent). On the other hand, memory used for the process by the kernel isn't accounted for, like page tables, kernel buffers, and kernel stack. In the overall picture you have to account for the memory reserved for the graphics card, the kernel's use, and assorted "holes" reserved for DOS and other prehistoric systems (that isn't much, anyway).
The only way of getting an overall picture is what the kernel reports as such. Adding up numbers with unknown overlaps and unknown left outs is a nice exercise in arithmetic, nothing more.
This problem might be caused by an incorrect sizing of the maximum size of the connection tracking table and the hash table. The Linux kernel tries to allocate contiguous pages to track the connection tables for the iptables nf_conntrack module. As you don't have enough physical memory, conntrack fails back to vmalloc.
This table is not dynamically created based on established connections but, rather, fully allocated based on some kernel parameters.
Some additional symptoms might be finding a large number of nf_conntrack: falling back to vmalloc. messages in the /var/log/messages (or /var/log/kern.log, or in both).
This is easily solvable by just fine tuning your connection track table and sizing it down. Proper sizing has to be done based on the system usage. The connection track table needs to be high if you are running a dedicated network firewall in this system, but can be much lower if you are just using iptables to protect it from network intrusions.
For more information on connection tracking tuning please refer to https://wiki.khnet.info/index.php/Conntrack_tuning
To fine tune the values for your system, you can first evaluate the number of connections your system keeps open by running conntrack -L
(or
/sbin/sysctl net.netfilter.nf_conntrack_count
). Better yet, keep a statistic of tracked connections over time (munin does this nicely) and use the maximum number of tracked connections as a baseline. Based on this information you can configure /etc/sysctl.conf accordingly.
When fine tuning make sure you also review how much time do you keep connections in the tracking table. Sometimes conntrack tables contain spurious data connections due to network misconfiguration or errors. For example, when the server receives SYN connections that are never closed or when client disconnects abruptly and leave open sockets for a long time.
Second, check if your conntrack entries make sense. Sometimes conntrack tables are filled with rubbish because of some network or firewall mis-configuration. Usually those are entries for connections which were never fully established. That may happen e.g. when the server gets incoming connection SYN packets, but the server replies are always lost somewhere on the network.
When fine tuning these values running sysctl -a | grep conntrack | grep timeout
might be provide some insight. The default values are quite conservative: 600 (10 minutes) for generic timeouts and 432000 (5 days) for an established TCP connection. Depending on the system purpose and network behaviour those might need to be fined tuned to reduce the number of active connections in the conntrack table. Which will help define a lower value to it.
Make sure, however, that you do not size the conntrack table down too much as you can have the opposite effect: connections being dropped by iptables because they cannot be tracked and you will start having messages such as this in your log files: 'kernel: ip_conntrack: table full, dropping packet.'
In order to confirm if that is the problem please provide the output of the following:
cat /proc/sys/net/ipv4/ip_conntrack_max
cat /proc/sys/net/ipv4/netfilter/ip_conntrack_buckets
Best Answer
On some demand-paged virtual memory systems, the operating system refuses to allocate anonymous pages (i.e. pages containing data without a filesystem source such as runtime data, program stack etc.) unless there is sufficient swap space to swap out the pages in order to free up physical memory. This strict accounting has the advantage that each process is guaranteed access to as much virtual memory they allocate, but is also means that the amount of virtual memory available is essentially limited by the size of the swap space.
In practice, programs tend to allocate more memory than they use. For instance, the Java Virtual Machine allocates a lot of virtual memory on startup, but does not use it immediately. Memory accounting in the Linux kernel attempts to compensate for this by tracking the amount of memory actually in use by processes, and overcommits the amount of virtual memory. In other words the amount of virtual memory allocated by the kernel can exceed the amount of physical memory and swap space combined on the system. While this leads to better utilization of physical memory and swap space, the downside is that when the amount of of memory in use exceeds the amount of physical memory and swap space available, the kernel must somehow free memory resources in order to meet the memory allocation commitment.
The kernel mechanism that is used to reclaim memory to fill the overcommitment is called the out-of-memory-killer (OOM-killer). Typically the mechanism will start killing off memory-hogging "rogue" processes to free up memory for other processes. However, if the
vm.panic_on_oom
sysctl setting is non-zero, the kernel will panic instead when the system runs out of memory.The possible values for the
vm.panic_on_oom
setting are as follows:0
(default) When an out-of-memory situation arises, the OOM-killer will kill a rogue process.1
The kernel normally panics, but if process that has reached its memory allocation limit set withmbind(MPOL_BIND)
orcpuset
, the process is killed instead.2
The kernel always panics in an out-of-memory situation.The heuristic used by the OOM-killer can be modified through the
vm.oom_kill_allocating_task
sysctl setting. The possible values are as follows:0
(default) The OOM-killer will scan through the task list and select a task rogue task utilizing a lot of memory to kill.1
(non-zero) The OOM-killer will kill the task that triggered the out-of-memory condition.The kernel memory accounting algorithm can be tuned with the
vm.overcommit_memory
sysctl settings. The possible values are as follows:0
(default) Heuristic overcommit with weak checks.1
Always overcommit, no checks.2
Strict accounting, in this mode the virtual address space limit is determined by the value ofvm.overcommit_ratio
settings according to the following formula:When strict memory accounting is in use, the kernel will no longer allocate anonymous pages unless it has enough free physical memory or swap space to store the pages. This means it is essential that the system is configured with enough swap space.
The sysctl settings can be checked or modified at runtime with the
sysctl
command. To make changes permanent the settings can be written to/etc/sysctl.conf
. The above settings are also available via the/proc/sys/vm
interface. The corresponding files are:/proc/sys/vm/panic_on_oom
/proc/sys/vm/oom_kill_allocating_task
/proc/sys/vm/overcommit_memory
/proc/sys/vm/overcommit_ratio