POSIX Compliance – Is Linux ARG_MAX Different from Other System Variables?

historylimitposix

In the shell, as explained in this this Q&A in the context of expansion, depending on the system, the maximum length of a command's argument is initially constrained by the kernel setup. The maximum value is revealed at runtime using the getconf command (see also IEEE Std 1003.1, 2013 Edition):

# getconf ARG_MAX
2097152
vs. value found in limits.h on my setup:
#define ARG_MAX       131072    /* # bytes of args + environ for exec() */

Indeed:

The sysconf() call supplies a value that corresponds to the conditions
when the program was either compiled or executed, depending on the
implementation; the system() call to getconf always supplies a value
corresponding to conditions when the program is executed.

The manpages reference POSIX, from the prolog alluding the POSIX Programmer's manual, to the description itself:

The value of each configuration variable shall be determined as if it
were obtained by calling the function from which it is defined to be
available by this volume of POSIX.1-2008 or by the System Interfaces
volume of POSIX.1-2008 (see the OPERANDS section). The value shall
reflect conditions in the current operating environment.

The basic variables which can be queried appear in the table for the sysconf function specification and there is more information about the values in the limits.h header documentation:

{ARG_MAX}
    Maximum length of argument to the exec functions including environment data.
    Minimum Acceptable Value: {_POSIX_ARG_MAX}
...(nb you cannot be POSIX compliant under a certain value...)
{_POSIX_ARG_MAX}
    Maximum length of argument to the exec functions including environment data.
    Value: 4 096

The xargs --show-limits command confirms some of this:

Your environment variables take up 3134 bytes
POSIX upper limit on argument length (this system): 2091970
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 2088836
Size of command buffer we are actually using: 131072

sysconf was initially designed to find the system value for the PATH variable, then it was extended to other variables. Now, the Open Group documentation explores the rationale for having such a framework where applications can poll for system variables at runtime, and the related practical considerations about the baseline…:

(…) If limited to the most restrictive values in the headers, such
applications would have to be prepared to accept the most limited
environments offered by the smallest microcomputers. Although this is
entirely portable, there was a consensus that they should be able to
take advantage of the facilities offered by large systems, without the
restrictions associated with source and object distributions.

During the discussions of this feature, it was pointed out that it is
almost always possible for an application to discern what a value
might be at runtime by suitably testing the various functions
themselves. And, in any event, it could always be written to
adequately deal with error returns from the various functions. In the
end, it was felt that this imposed an unreasonable level of
complication and sophistication on the application developer.

…as well as the shortcomings of such a setup as it relates to some file variables with fpathconf:

The pathconf() function was proposed immediately after the sysconf()
function when it was realized that some configurable values may differ
across file system, directory, or device boundaries.

For example, {NAME_MAX} frequently changes between System V and
BSD-based file systems; System V uses a maximum of 14, BSD 255. On an
implementation that provides both types of file systems, an
application would be forced to limit all pathname components to 14
bytes, as this would be the value specified in on such a
system.

So the intent was to relieve developers of some burden for the baseline while also acknowledging variety in the filesystems and generally enabling some customizing on different variants of the platform. The evolution of hardware, Unix and related standards (C and POSIX) plays a role here.

Questions:

  • The command getconf doesn't have a "list" option, and set, printenv or export don't show those variables. Is there a command which lists their value?
  • Why facilities like fpathconf were seemingly built to introduce more flexibility, but only for PATH and file related system variables? Is it just because at that time getconf was only about PATH?
  • What is the current Linux implementation, and is it POSIX compliant? In the linked Q there is reference in the answers to ARG_MAX varying with the stack size ("on Linux 3.11… a quarter of the limit set on the stack size, or 128kiB if that's less than 512kiB"):
    • What is the rationale for this?
    • Is this choice (1/4 of the stack size) a Linux specific implementation or just a feature on top of the basic implementation or did the historical UNIX implementation always yield basically that 1/4th of the stack size?
    • Are many other variables besides ARG_MAX a function of the stack size or similar resources or does the importance of this variable warrant a special treatment?
    • Practically, does one deliver a POSIX compliant Linux system/solution and there's configuration of the stack size limit for example to allow some application to go beyond the basic maximum spec if it scales up with the hardware or is it a practice to customize directly limits.h and compile for specific needs?
    • What is the difference for something like ARG_MAX between using limits.h vs. changing the variable at runtime with something like the ulimit -s command vs. having the kernel manage it directly? In particular is the (low)value of that variable in my limits.h obsolete on Linux because of kernel changes i.e. has it been superseded?
  • The command line supposedly has shell specific length restrictions which are not related to expansion and ARG_MAX; what are they in bash?

Best Answer

There is no standard way to retrieve the list of configuration variables that are supported on a system. If you program for a given POSIX version, the list in that version of the POSIX specification is your reference list. On Linux, getconf -a lists all available variable.

fpathconf isn't specific to PATH. It's about variables that are related to files, which are the ones that may vary from file to file.

Regarding ARG_MAX on Linux, the rationale for depending on the stack size is that the arguments end up on the stack, so there had better be enough room for them plus everything else that must fit. Most other implementations (including older versions of Linux) have a fixed size.

Most limits go together with resource availability, with different resources depending on the limit. For example, a process may be unable to open a file even if it has fewer than OPEN_MAX files open, if the system is out of memory that can be used for the file-related data.

Linux is POSIX-compliant on this point by default, so I don't know where you're getting at.

If you use ulimit -s to restrict the stack size to less than ARG_MAX, you're making the system no longer compliant. A POSIX system can typically be made non-compliant in any number of ways, including PATH=/nowhere (making all standard utilities unavailable) or rm -rf /.

The value of ARG_MAX in limits.h provides a minimum that applications can rely on. A POSIX-compliant system is allowed to let execve succeed even if the arguments exceed that size. The guarantee related to ARG_MAX is that if the arguments fit in that size then execve will not fail due E2BIG.