Getrusage system call: what is “maximum resident set size”

kernelmemorysystem-calls

man getrusage 2 says

ru_maxrss (since Linux 2.6.32)
              This is the maximum resident set size used (in kilobytes). For RUSAGE_CHILDREN, this is the resident set size of the largest
              child, not the maximum resident set size of the process tree.

So what does this number mean exactly?

Best Answer

A process's resident set size is the amount of memory that belongs to it and is currently present (resident) in RAM (real RAM, not swapped or otherwise not-resident).

For instance, if a process allocates a chunk of memory (say 100Mb) and uses it actively (reads/writes to it), its resident set size will be about 100Mb (plus overhead, the code segment, etc.). If after the process then stops using (but doesn't release) that memory for a while, the OS could opt to swap chunks of that memory to swap, to make room for other processes (or cache). The resident set size would then decrease by the amount the kernel swapped out. If the process wakes up and starts re-using that memory, the kernel would re-load the data from swap, and the resident set size would go up again.

The ru_maxrss field of struct rusage is the "high water mark" for the resident set size. It indicates the peak RAM use for this process (when using RUSAGE_SELF).

You can limit a process's resident set size to avoid having a single application "eat up" all the RAM on your system and forcing other applications to swap (or fail entirely with out-of-memory conditions).

Related Solutions

Resident Set Size vs Virtual Size – Explanation of Memory Metrics

RSS is how much memory this process currently has in main memory (RAM). VSZ is how much virtual memory the process has in total. This includes all types of memory, both in RAM and swapped out. These numbers can get skewed because they also include shared libraries and other types of memory. You can have five hundred instances of bash running, and the total size of their memory footprint won't be the sum of their RSS or VSZ values.

If you need to get a more detailed idea about the memory footprint of a process, you have some options. You can go through /proc/$PID/map and weed out the stuff you don't like. If it's shared libraries, the calculation could get complex depending on your needs (which I think I remember).

If you only care about the heap size of the process, you can always just parse the [heap] entry in the map file. The size the kernel has allocated for the process heap may or may not reflect the exact number of bytes the process has asked to be allocated. There are minute details, kernel internals and optimisations which can throw this off. In an ideal world, it'll be as much as your process needs, rounded up to the nearest multiple of the system page size (getconf PAGESIZE will tell you what it is — on PCs, it's probably 4,096 bytes).

If you want to see how much memory a process has allocated, one of the best ways is to forgo the kernel-side metrics. Instead, you instrument the C library's heap memory (de)allocation functions with the LD_PRELOAD mechanism. Personally, I slightly abuse valgrind to get information about this sort of thing. (Note that applying the instrumentation will require restarting the process.)

Please note, since you may also be benchmarking runtimes, that valgrind will make your programs very slightly slower (but probably within your tolerances).

kernel – What Defines the Maximum Size for a Command Single Argument?

Answers

Definitely not a bug.

The parameter which defines the maximum size for one argument is MAX_ARG_STRLEN. There is no documentation for this parameter other than the comments in binfmts.h:

/*
 * These are the maximum length and maximum number of strings passed to the
 * execve() system call.  MAX_ARG_STRLEN is essentially random but serves to
 * prevent the kernel from being unduly impacted by misaddressed pointers.
 * MAX_ARG_STRINGS is chosen to fit in a signed 32-bit integer.
 */
#define MAX_ARG_STRLEN (PAGE_SIZE * 32)
#define MAX_ARG_STRINGS 0x7FFFFFFF

As is shown, Linux also has a (very large) limit on the number of arguments to a command.

A limit on the size of a single argument (which differs from the overall limit on arguments plus environment) does appear to be specific to Linux. This article gives a detailed comparison of ARG_MAX and equivalents on Unix like systems. MAX_ARG_STRLEN is discussed for Linux, but there is no mention of any equivalent on any other systems.

The above article also states that MAX_ARG_STRLEN was introduced in Linux 2.6.23, along with a number of other changes relating to command argument maximums (discussed below). The log/diff for the commit can be found here.
It is still not clear what accounts for the additional discrepancy between the result of getconf ARG_MAX and the actual maximum possible size of arguments plus environment. Stephane Chazelas' related answer, suggests that part of the space is accounted for by pointers to each of the argument/environment strings. However, my own investigation suggests that these pointers are not created early in the execve system call when it may still return a E2BIG error to the calling process (although pointers to each argv string are certainly created later).

Also, the strings are contiguous in memory as far as I can see, so no memory gaps due do alignment here. Although is very likely to be a factor within whatever does use up the extra memory. Understanding what uses the extra space requires a more detailed knowledge of how the kernel allocates memory (which is useful knowledge to have, so I will investigate and update later).

ARG_MAX Confusion

Since the Linux 2.6.23 (as result of this commit), there have been changes to the way that command argument maximums are handled which makes Linux differ from other Unix-like systems. In addition to adding MAX_ARG_STRLEN and MAX_ARG_STRINGS, the result of getconf ARG_MAX now depends on the stack size and may be different from ARG_MAX in limits.h.

Normally the result of getconf ARG_MAX will be 1/4 of the stack size. Consider the following in bash using ulimit to get the stack size:

$ echo $(( $(ulimit -s)*1024 / 4 ))  # ulimit output in KiB
2097152
$ getconf ARG_MAX
2097152

However, the above behaviour was changed slightly by this commit (added in Linux 2.6.25-rc4~121). ARG_MAX in limits.h now serves as a hard lower bound on the result of getconf ARG_MAX. If the stack size is set such that 1/4 of the stack size is less than ARG_MAX in limits.h, then the limits.h value will be used:

$ grep ARG_MAX /usr/include/linux/limits.h 
#define ARG_MAX       131072    /* # bytes of args + environ for exec() */
$ ulimit -s 256
$ echo $(( $(ulimit -s)*1024 / 4 ))
65536
$ getconf ARG_MAX
131072

Note also that if the stack size set lower than the minimum possible ARG_MAX, then the size of the stack (RLIMIT_STACK) becomes the upper limit of argument/environment size before E2BIG is returned (although getconf ARG_MAX will still show the value in limits.h).

A final thing to note is that if the kernel is built without CONFIG_MMU (support for memory management hardware), then the checking of ARG_MAX is disabled, so the limit does not apply. Although MAX_ARG_STRLEN and MAX_ARG_STRINGS still apply.