Bash – Why not use backticks with for loop

bashshell-script

Some time ago, I posted answer to some question about scripting. Someone pointed out that I shouldn't use following command:

for x in $(cat file); do something; done

but instead of that:

while read f; do something; done < file

Usless Use of Cat article suppose to explain the whole problem, but the only explanation is:

The backticks are outright dangerous, unless you know the result of
the backticks is going to be less than or equal to how long a command
line your shell can accept. (Actually, this is a kernel limitation.
The constant ARG_MAX in your limits.h should tell you how much your
own system can take. POSIX requires ARG_MAX to be at least 4,096
bytes.)

If I correctly understood this, bash(?) should crash if I use output of very big file in command (it should exceed ARG_MAX define in limits.h file). So I checked ARG_MAX with command:

> grep ARG_MAX /usr/src/kernels/$(uname -r)/include/uapi/linux/limits.h
#define ARG_MAX       131072    /* # bytes of args + environ for exec() */

Then I created file containing text with no spaces:

> ls -l
-rw-r--r--. 1 root root 100000000 Aug 21 15:37 in_file

Then I run:

for i in $(cat in_file); do echo $i; done

aaaand nothing terrible happened.

So what should I do to check if/how this whole 'don't use cat with loop' thing is dangerous?

Best Answer

It depends what file is meant to contain. If it's meant to contain a IFS-separated list of shell globs like (assuming the default value of $IFS):

/var/log/*.log /var/adm/*~
/some/dir/*.txt

Then for i in $(cat file) would be the way to go. As that's what that unquoted $(cat file) does: apply the split+glob operator on the output of cat file stripped of its trailing newline characters. So it would loop over each filename resulting of the expansions of those globs (except in the cases where the globs don't match any file where that would leave the glob there but unexpanded).

If you wanted to loop over each delimited line of file, you'd do:

while IFS= read -r line <&3; do
{
  something with "$line"
} 3<&-
done 3< file

With a for loop, you could loop over every non-empty line with:

IFS='
' # split on newline only (actually sequences of newlines and
  # ignoring leading and trailing ones as newline is a
  # IFS whitespace character)
set -o noglob # disable the glob part of the split+glob operator:
for line in $(cat file); do
   something with "$line"
done

However a:

while read line; do
  something with "$line"
done < file

Makes little sense. That's reading the content of file in a very convoluted way where characters of $IFS and backslashes are treated specially.

In any case, the ARG_MAX limit the text you're quoting refers to is on the execve() system call (on the cumulative size of the arguments and environment variables), so only applies to cases where a command on the filesystem is being executed with the possibly very long expansion of the split+glob operator applied to the command substitution (that text is misleading and wrong on several accounts).

It would apply for instance in:

cat -- $(cat file) # with shell implementations where cat is not builtin

But not in:

for i in $(cat file)

where there's no execve() system call involved.

Compare:

bash-4.4$ echo '/*/*/*/*' > file
bash-4.4$ true $(cat file)
bash-4.4$ n=0; for f in $(cat file); do ((n++)); done; echo "$n"
523696
bash-4.4$ /bin/true $(cat file)
bash: /bin/true: Argument list too long

It's OK with bash's true builtin command or the for loop, but not when executing /bin/true. Note how the file is just 9 bytes large but the expansion of $(cat file) is several megabytes because the /*/*/*/* glob is being expanded by the shell.

Related Solutions

Bash – Backticks as words in a for loop result in strange behavior

As shown in your:

$ trap -p DEBUG
trap -- 'preexec_invoke_exec' DEBUG

You've got a DEBUG trap that runs commands before each command. By the look of it, it seems that preexec_invoke_exec command (eventually) tries to update the title of your terminal emulator with the command being run.

However, to do that, it writes the escape sequence to stdout instead of to the terminal.

You should identify where it does that and add a > /dev/tty redirection to make sure it's always sent to the tty device even when stdout is redirected like inside that $(...) command substitution.

Also note that if that preexec_invoke_exec is meant to emulate zsh's preexec, it's doing something wrong as the preexec hook in zsh is only meant to be executed once just after a command line has been accepted and before it is run, not for each command in that command line.

Bash – Use For Loop Variable in Other Commands

The issue here is that you are trying to get shell variable expansion inside single quotes ' ... '. However, inside single quotes, shell variable expansion is suspended (see e.g. this answer here or Lhunath & GreyCat's Bash Guide), which is the reason why this is actually recommended: as the $ performs a similar function in awk in dereferencing individual input fields (but where the field number can be expressed by a variable name as well, as in e.g. $NF), enclosing the program in single quotes avoids concurrent "variable expansions".

In your case, you can "import" the value into the awk program with

awk -v fieldnr="$i" '{if ($NF==fieldnr) {print $NF}}'

Still, it would appear that your problem can be solved entirely in awk, so maybe you want to explain what you want to accomplish in more detail and we can try to find a more elegant (and possibly faster) way; it may be reasonable to open another question, though ...

Best Answer

Related Solutions

Bash – Backticks as words in a for loop result in strange behavior

Bash – Use For Loop Variable in Other Commands

Related Question