Some time ago, I posted answer to some question about scripting. Someone pointed out that I shouldn't use following command:
for x in $(cat file); do something; done
but instead of that:
while read f; do something; done < file
Usless Use of Cat article suppose to explain the whole problem, but the only explanation is:
The backticks are outright dangerous, unless you know the result of
the backticks is going to be less than or equal to how long a command
line your shell can accept. (Actually, this is a kernel limitation.
The constant ARG_MAX in your limits.h should tell you how much your
own system can take. POSIX requires ARG_MAX to be at least 4,096
bytes.)
If I correctly understood this, bash(?) should crash if I use output of very big file in command (it should exceed ARG_MAX define in limits.h file). So I checked ARG_MAX with command:
> grep ARG_MAX /usr/src/kernels/$(uname -r)/include/uapi/linux/limits.h
#define ARG_MAX 131072 /* # bytes of args + environ for exec() */
Then I created file containing text with no spaces:
> ls -l
-rw-r--r--. 1 root root 100000000 Aug 21 15:37 in_file
Then I run:
for i in $(cat in_file); do echo $i; done
aaaand nothing terrible happened.
So what should I do to check if/how this whole 'don't use cat with loop' thing is dangerous?
Best Answer
It depends what
file
is meant to contain. If it's meant to contain a IFS-separated list of shell globs like (assuming the default value of$IFS
):Then
for i in $(cat file)
would be the way to go. As that's what that unquoted$(cat file)
does: apply the split+glob operator on the output ofcat file
stripped of its trailing newline characters. So it would loop over each filename resulting of the expansions of those globs (except in the cases where the globs don't match any file where that would leave the glob there but unexpanded).If you wanted to loop over each delimited line of
file
, you'd do:With a
for
loop, you could loop over every non-empty line with:However a:
Makes little sense. That's reading the content of
file
in a very convoluted way where characters of$IFS
and backslashes are treated specially.In any case, the ARG_MAX limit the text you're quoting refers to is on the
execve()
system call (on the cumulative size of the arguments and environment variables), so only applies to cases where a command on the filesystem is being executed with the possibly very long expansion of the split+glob operator applied to the command substitution (that text is misleading and wrong on several accounts).It would apply for instance in:
But not in:
where there's no
execve()
system call involved.Compare:
It's OK with
bash
'strue
builtin command or thefor
loop, but not when executing/bin/true
. Note how thefile
is just 9 bytes large but the expansion of$(cat file)
is several megabytes because the/*/*/*/*
glob is being expanded by the shell.More reading at: