Shell – Why some shells `read` builtin fail to read the whole line from file in `/proc`

linuxprocreadshell

In some Bourne-like shells, the read builtin can not read the whole line from file in /proc (the command below should be run in zsh, replace $=shell with $shell with other shells):

$ for shell in bash dash ksh mksh yash zsh schily-sh heirloom-sh "busybox sh"; do
  printf '[%s]\n' "$shell"
  $=shell -c 'IFS= read x </proc/sys/fs/file-max; echo "$x"'       
done
[bash]
602160
[dash]
6
[ksh]
602160
[mksh]
6
[yash]
6
[zsh]
6
[schily-sh]
602160
[heirloom-sh]
602160
[busybox sh]
6

read standard requires the standard input need to be a text file, does that requirement cause the varied behaviors?

Read the POSIX definition of text file, I do some verification:

$ od -t a </proc/sys/fs/file-max 
0000000   6   0   2   1   6   0  nl
0000007

$ find /proc/sys/fs -type f -name 'file-max'
/proc/sys/fs/file-max

There's no NUL character in content of /proc/sys/fs/file-max, and also find reported it as a regular file (Is this a bug in find?).

I guess the shell did something under the hood, like file:

$ file /proc/sys/fs/file-max
/proc/sys/fs/file-max: empty

Best Answer

The problem is that those /proc files on Linux appear as text files as far as stat()/fstat() is concerned, but do not behave as such.

Because it's dynamic data, you can only do one read() system call on them (for some of them at least). Doing more than one could get you two chunks of two different contents, so instead it seems a second read() on them just returns nothing (meaning end-of-file) (unless you lseek() back to the beginning (and to the beginning only)).

The read utility needs to read the content of files one byte at a time to be sure not to read past the newline character. That's what dash does:

 $ strace -fe read dash -c 'read a < /proc/sys/fs/file-max'
 read(0, "1", 1)                         = 1
 read(0, "", 1)                          = 0

Some shells like bash have an optimisation to avoid having to do so many read() system calls. They first check whether the file is seekable, and if so, read in chunks as then they know they can put the cursor back just after the newline if they've read past it:

$ strace -e lseek,read bash -c 'read a' < /proc/sys/fs/file-max
lseek(0, 0, SEEK_CUR)                   = 0
read(0, "1628689\n", 128)               = 8

With bash, you'd still have problems for proc files that are more than 128 bytes large and can only be read in one read system call.

bash also seems to disable that optimization when the -d option is used.

ksh93 takes the optimisation even further so much as to become bogus. ksh93's read does seek back, but remembers the extra data it has read for the next read, so the next read (or any of its other builtins that read data like cat or head) doesn't even try to read the data (even if that data has been modified by other commands in between):

$ seq 10 > a; ksh -c 'read a; echo test > a; read b; echo "$a $b"' < a
1 2
$ seq 10 > a; sh -c 'read a; echo test > a; read b; echo "$a $b"' < a
1 st

Related Solutions

Bash – Read line from file then delete

I'd prefer to invoke sed just one time then on each line iteration. If you like to change original file you can add -i(--in-place) option to sed.

unset n
while read -r line; do
  echo $line
  : $((n++))
done < myfile.txt
sed "1,$n d" myfile.txt

Bash – Is “Arithmetic Expansion” the expected action on vars inside `[[` tests

From man ksh:

An arithmetic expression uses the same syntax, precedence, and associativity of expression as the C language. All the C language operators that apply to floating point quantities can be used... Variables can be referenced by name within an arithmetic expression without using the parameter expansion syntax. When a variable is referenced, its value is evaluated as an arithmetic expression...

A conditional expression is used with the [[ compound command to test attributes of files and to compare strings. Field splitting and file name generation are not performed on the words between [[ and ]]. Each expression can be constructed from one or more of the following unary or binary expressions...

The following obsolete arithmetic comparisons are also permitted:

exp1-eqexp2

True, if exp1 is equal to exp2.

exp1-neexp2

True, if exp1 is not equal to exp2.

exp1-ltexp2

True, if exp1 is less than exp2.

exp1-gtexp2

True, if exp1 is greater than exp2.

exp1-leexp2

True, if exp1 is less than or equal to exp2.

exp1-geexp2

True, if exp1 is greater than or equal to exp2.

The documentation there is consistent where references to arithmetic expressions are concerned, and (apparently carefully) avoids any self-contradictions surrounding the definition of the [[ compound command ]] pertaining to string comparison by explicitly also permitting some obsolete arithmetic comparisons in the same context.

From man bash:

[[expression]]

Return a status of 0 or 1 depending on the evaluation of the condi‐ tional expression expression. Expressions are composed of the primaries described below... Word splitting and pathname expansion are not performed on the words between the [[ and ]]; ~tilde expansion, ${parameter} and $variable expansion, $((arithmetic expansion)), $(command substitution), <(process substitution), and "\'quote removal are performed. Conditional operators such as -f must be unquoted to be recognized as primaries...

A variable may be assigned to by a statement of the form:

name=[value]

If value is not given, the variable is assigned the null string. All values undergo ~tilde expansion, ${parameter} and $variable expansion, $(command substitution), $((arithmetic expansion)), and "\'quote removal... If the variable has its integer attribute set, then value is evaluated as an $((arithmetic expression)) even if the $((...)) expansion is not used...

The shell allows arithmetic expressions to be evaluated, under certain circumstances... Evaluation is done in fixed-width integers with no check for overflow, though division by 0 is trapped and flagged as an error. The operators and their precedence, associativity, and values are the same as in the C language.

Shell variables are allowed as operands; parameter expansion is performed before the expression is evaluated. Within an expression, shell variables may also be referenced by name without using the parameter expansion syntax. The value of a variable is evaluated as an arithmetic expression when it is referenced, or when a variable which has been given the integer attribute using declare -i is assigned a value... A shell variable need not have its integer attribute turned on to be used in an expression.

Conditional expressions are used by the [[ compound command and the test and [ builtin commands to test file attributes and perform string and arithmetic comparisons...

arg1OParg2

OP is one of -eq, -ne, -lt, -le, -gt, or -ge. These arithmetic binary operators return true if arg1 is equal to, not equal to, less than, less than or equal to, greater than, or greater than or equal to arg2, respectively. arg1 and arg2 may be positive or negative integers.

I think, given all that context, the behavior you observe stands to reason, even if it is not explicitly spelled out as a possibility in the documentation there. The docs do point to special treatment of parameters with integer attributes, and clearly denote a difference between a compound command and a builtin command.

The [[ comparison is syntax in the same sense that the assignment name=value is syntax or casewordin... is syntax. test and [, however, are not as such, and are rather separate procedures which take arguments. As I think, the best way to really get a feel for the differences is to have a look at shell error output:

set   '[[ \\ -eq 0 ]]' '[ \\ -eq 0 ]'
for    sh in   bash ksh
do     for     exp
       do     "$sh" -c  "$1||$2"
               set "$2" "$1"
done;  done

bash: [[: \: syntax error: operand expected (error token is "\")
bash: line 0: [: \: integer expression expected
bash: line 0: [: \: integer expression expected
bash: [[: \: syntax error: operand expected (error token is "\")
ksh: \: arithmetic syntax error
ksh: [: \: arithmetic syntax error
ksh: \: arithmetic syntax error

The two shells handle the exceptions differently, but the underlying reasons for the differences in both cases for both shells are very similar.

bash directly calls the [[ \\ case a syntax error - in the same way it might for a redirect from a non-existent file, for example - though it goes on from that point (as I believe, incorrectly) to evaluate the other side of the || or expression. bash does give the [[ expression a command name in error output, but note that it doesn't bother discussing the line number on which you call it as it does for the [ command. bash's [ complains about not receiving what it expects to be an integer expression as an argument, but [[ need not complain in that way because it doesn't really take arguments, and never needs to expect anything at all when it is parsed alongside the expansions themselves.
ksh halts altogether when the [[ syntax error and doesn't bother with [ at all. It writes the same error message for both, but note that [ is assigned a command name there where [[ is just ksh. The [ is only called after the command-line has been successfully parsed and expansions have already occurred - it will do its own little getopts routine and get its own arg[0c] and the rest, but [[ is handled as underlying shell syntax once again.

I consider the bash docs slightly less clear than the ksh version in that they use the terms arg[12] rather than expression regarding integer comparisons, but I think it is done merely because [[, [, and test are all lumped together at that juncture, and the latter two do take arguments whereas the former only ever receives an expression.

In any case, where the integer comparison is not ambiguous in the syntax context, you can basically do any valid math operation mid-expression:

   m=5+5  a[m]=10
[[     m   -eq 10 ]] &&
[[     m++ -eq 10 ]] &&
[[     m-- -gt 10 ]] &&
[[ ${a[m]}  == 10 ]] &&
echo "math evals"

math evals

Best Answer

Related Solutions

Bash – Read line from file then delete

Bash – Is “Arithmetic Expansion” the expected action on vars inside `[[` tests

Related Question