Shell – Why some shells `read` builtin fail to read the whole line from file in `/proc`

linuxprocreadshell

In some Bourne-like shells, the read builtin can not read the whole line from file in /proc (the command below should be run in zsh, replace $=shell with $shell with other shells):

$ for shell in bash dash ksh mksh yash zsh schily-sh heirloom-sh "busybox sh"; do
  printf '[%s]\n' "$shell"
  $=shell -c 'IFS= read x </proc/sys/fs/file-max; echo "$x"'       
done
[bash]
602160
[dash]
6
[ksh]
602160
[mksh]
6
[yash]
6
[zsh]
6
[schily-sh]
602160
[heirloom-sh]
602160
[busybox sh]
6

read standard requires the standard input need to be a text file, does that requirement cause the varied behaviors?


Read the POSIX definition of text file, I do some verification:

$ od -t a </proc/sys/fs/file-max 
0000000   6   0   2   1   6   0  nl
0000007

$ find /proc/sys/fs -type f -name 'file-max'
/proc/sys/fs/file-max

There's no NUL character in content of /proc/sys/fs/file-max, and also find reported it as a regular file (Is this a bug in find?).

I guess the shell did something under the hood, like file:

$ file /proc/sys/fs/file-max
/proc/sys/fs/file-max: empty

Best Answer

The problem is that those /proc files on Linux appear as text files as far as stat()/fstat() is concerned, but do not behave as such.

Because it's dynamic data, you can only do one read() system call on them (for some of them at least). Doing more than one could get you two chunks of two different contents, so instead it seems a second read() on them just returns nothing (meaning end-of-file) (unless you lseek() back to the beginning (and to the beginning only)).

The read utility needs to read the content of files one byte at a time to be sure not to read past the newline character. That's what dash does:

 $ strace -fe read dash -c 'read a < /proc/sys/fs/file-max'
 read(0, "1", 1)                         = 1
 read(0, "", 1)                          = 0

Some shells like bash have an optimisation to avoid having to do so many read() system calls. They first check whether the file is seekable, and if so, read in chunks as then they know they can put the cursor back just after the newline if they've read past it:

$ strace -e lseek,read bash -c 'read a' < /proc/sys/fs/file-max
lseek(0, 0, SEEK_CUR)                   = 0
read(0, "1628689\n", 128)               = 8

With bash, you'd still have problems for proc files that are more than 128 bytes large and can only be read in one read system call.

bash also seems to disable that optimization when the -d option is used.

ksh93 takes the optimisation even further so much as to become bogus. ksh93's read does seek back, but remembers the extra data it has read for the next read, so the next read (or any of its other builtins that read data like cat or head) doesn't even try to read the data (even if that data has been modified by other commands in between):

$ seq 10 > a; ksh -c 'read a; echo test > a; read b; echo "$a $b"' < a
1 2
$ seq 10 > a; sh -c 'read a; echo test > a; read b; echo "$a $b"' < a
1 st
Related Question