kernel – What Defines the Maximum Size for a Command Single Argument?

argumentsechokernel

I was under the impression that the maximum length of a single argument was not the problem here so much as the total size of the overall argument array plus the size of the environment, which is limited to ARG_MAX. Thus I thought that something like the following would succeed:

env_size=$(cat /proc/$$/environ | wc -c)
(( arg_size = $(getconf ARG_MAX) - $env_size - 100 ))
/bin/echo $(tr -dc [:alnum:] </dev/urandom | head -c $arg_size) >/dev/null

With the - 100 being more than enough to account for the difference between the size of the environment in the shell and the echo process. Instead I got the error:

bash: /bin/echo: Argument list too long

After playing around for a while, I found that the maximum was a full hex order of magnitude smaller:

/bin/echo \
  $(tr -dc [:alnum:] </dev/urandom | head -c $(($(getconf ARG_MAX)/16-1))) \
  >/dev/null

When the minus one is removed, the error returns. Seemingly the maximum for a single argument is actually ARG_MAX/16 and the -1 accounts for the null byte placed at the end of the string in the argument array.

Another issue is that when the argument is repeated, the total size of the argument array can be closer to ARG_MAX, but still not quite there:

args=( $(tr -dc [:alnum:] </dev/urandom | head -c $(($(getconf ARG_MAX)/16-1))) )
for x in {1..14}; do
  args+=( ${args[0]} )
done

/bin/echo "${args[@]}" "${args[0]:6534}" >/dev/null

Using "${args[0]:6533}" here makes the last argument 1 byte longer and gives the Argument list too long error. This difference is unlikely to be accounted for by the size of the environment given:

$ cat /proc/$$/environ | wc -c
1045

Questions:

  1. Is this correct behaviour, or is there a bug somewhere?
  2. If not, is this behaviour documented anywhere? Is there another parameter which defines the maximum for a single argument?
  3. Is this behaviour limited to Linux (or even particular versions of such)?
  4. What accounts for the additional ~5KB discrepancy between the actual maximum size of the argument array plus the approximate size of the environment and ARG_MAX?

Additional info:

uname -a
Linux graeme-rock 3.13-1-amd64 #1 SMP Debian 3.13.5-1 (2014-03-04) x86_64 GNU/Linux

Best Answer

Answers

  1. Definitely not a bug.
  2. The parameter which defines the maximum size for one argument is MAX_ARG_STRLEN. There is no documentation for this parameter other than the comments in binfmts.h:

    /*
     * These are the maximum length and maximum number of strings passed to the
     * execve() system call.  MAX_ARG_STRLEN is essentially random but serves to
     * prevent the kernel from being unduly impacted by misaddressed pointers.
     * MAX_ARG_STRINGS is chosen to fit in a signed 32-bit integer.
     */
    #define MAX_ARG_STRLEN (PAGE_SIZE * 32)
    #define MAX_ARG_STRINGS 0x7FFFFFFF
    

    As is shown, Linux also has a (very large) limit on the number of arguments to a command.

  3. A limit on the size of a single argument (which differs from the overall limit on arguments plus environment) does appear to be specific to Linux. This article gives a detailed comparison of ARG_MAX and equivalents on Unix like systems. MAX_ARG_STRLEN is discussed for Linux, but there is no mention of any equivalent on any other systems.

    The above article also states that MAX_ARG_STRLEN was introduced in Linux 2.6.23, along with a number of other changes relating to command argument maximums (discussed below). The log/diff for the commit can be found here.

  4. It is still not clear what accounts for the additional discrepancy between the result of getconf ARG_MAX and the actual maximum possible size of arguments plus environment. Stephane Chazelas' related answer, suggests that part of the space is accounted for by pointers to each of the argument/environment strings. However, my own investigation suggests that these pointers are not created early in the execve system call when it may still return a E2BIG error to the calling process (although pointers to each argv string are certainly created later).

    Also, the strings are contiguous in memory as far as I can see, so no memory gaps due do alignment here. Although is very likely to be a factor within whatever does use up the extra memory. Understanding what uses the extra space requires a more detailed knowledge of how the kernel allocates memory (which is useful knowledge to have, so I will investigate and update later).

ARG_MAX Confusion

Since the Linux 2.6.23 (as result of this commit), there have been changes to the way that command argument maximums are handled which makes Linux differ from other Unix-like systems. In addition to adding MAX_ARG_STRLEN and MAX_ARG_STRINGS, the result of getconf ARG_MAX now depends on the stack size and may be different from ARG_MAX in limits.h.

Normally the result of getconf ARG_MAX will be 1/4 of the stack size. Consider the following in bash using ulimit to get the stack size:

$ echo $(( $(ulimit -s)*1024 / 4 ))  # ulimit output in KiB
2097152
$ getconf ARG_MAX
2097152

However, the above behaviour was changed slightly by this commit (added in Linux 2.6.25-rc4~121). ARG_MAX in limits.h now serves as a hard lower bound on the result of getconf ARG_MAX. If the stack size is set such that 1/4 of the stack size is less than ARG_MAX in limits.h, then the limits.h value will be used:

$ grep ARG_MAX /usr/include/linux/limits.h 
#define ARG_MAX       131072    /* # bytes of args + environ for exec() */
$ ulimit -s 256
$ echo $(( $(ulimit -s)*1024 / 4 ))
65536
$ getconf ARG_MAX
131072

Note also that if the stack size set lower than the minimum possible ARG_MAX, then the size of the stack (RLIMIT_STACK) becomes the upper limit of argument/environment size before E2BIG is returned (although getconf ARG_MAX will still show the value in limits.h).

A final thing to note is that if the kernel is built without CONFIG_MMU (support for memory management hardware), then the checking of ARG_MAX is disabled, so the limit does not apply. Although MAX_ARG_STRLEN and MAX_ARG_STRINGS still apply.

Further Reading

Related Question