A while loop and an here-document – what happens when

control flowhere-documentreadtext processingvariable

I have this while loop and here-document combo which I run in Bash 4.3.48(1) and I don't understand its logic at all.

while read file; do source ~/unwe/"$file"
done <<-EOF
    x.sh
    y.sh
EOF

My question is comprised of these parts:

What does the read do here (I always use read to declare a variable and assign its value interactively, but I'm missing what it's supposed to do here).
What is the meaning of while read? Where does the concept of while come in here?
If the here-document itself comes after the loop, how is it even affected by the loop? I mean, it comes after done, and not inside the loop, so what's the actual association between these two structures?
Why does this fail?
```
while read file; do source ~/unwe/"$file" done <<-EOF
    x.sh
    y.sh
EOF
```
I mean, done is done… So why does it matter if done <<-EOF is on the same line as the loop? If I recall correctly, I did have a case in which a for loop was one-liner and still worked.

Best Answer

The read command reads from its standard input stream and assigns what's read to the variable file (it's a bit more compicated than that, see long discussion here). The standard input stream is coming from the here-document redirected into the loop after done. If not given data from anywhere, it will read from the terminal, interactively. In this case though, the shell has arranged to connect its input stream to the here-document.
while read will cause the loop to iterate until the read command returns a non-zero exit status. This will happen if there are any errors, or (most commonly) when there is no more data to be read (its input stream is in an end-of-file state).

The convention is that any utility that wishes to signal an error or "false" or "no" to the calling shell does so by returning a non-zero exit status. A zero exit status signals "true" or "yes" or "no error". This status, would you wish to inspect it, is available in $? (only from the last executed utility). The exit status may be used in if statements and while loops or anywhere where a test is required. For example
```
if grep -q 'pattern' file; then ...; fi
```
A here-document is a form of redirection. In this case, it's a redirection into the loop. Anything inside the loop could read from it but in this case it's only the read command that does. Do read up on here-documents. If the input was coming from an ordinary file, the last line would have been
```
done <filename
```
Seeing the loop as one single command may make this more intuitive:
```
while ...; do ...; done <filename
```
which is one case of
```
somecommand <filename
```
Some shells also supports "here-strings" with <<<"string":
```
cat <<<"This is the here-string"
```
DavidFoerster points out that if any of the two scripts x.sh and y.sh reads from standard input, without explicitly being given data to read from a file or from elsewhere, the data read will actually come from the here-document.

With a x.sh that contains only read a, this would make the variable a contain the string y.sh, and the y.sh script would never run. This is due to the fact that the standard input is redirected for all commands in the while loop (and also "inherited" by any invoked script or command) and the second line is "consumed" by x.sh before the while loop's read can read it.

If this behaviour is unwanted, it can be avoided, but it's a bit tricky.
It fails because there is no ; or newline before done. Without ; or newline before done, the word done will be taken as an argument of source, and the loop will additionally not be properly closed (this is a syntax error).

It is almost true that any ; may be replaced by a newline (at least when it's a command delimiter). It signals the end of a command, as does |, &, && and || (and probably others that I have forgotten).

Related Solutions

Bash – How to Signal End-of-Input to ‘read -N’

It might be more helpful if the doc pointed out that there's no such thing as an ASCII EOF, that the ASCII semantics for ^D is EOT, which is what the terminal driver supplies in canonical mode: it ends the current transmission, the read. Programs interpret a 0-length read as EOF, because that's what EOF looks like on files that have that, but the terminal driver refusing to deliver character code 4 and instead swallowing it and terminating the read isn't always what you want.

That's what's going on here: control character semantics are part of canonical mode, the mode where the terminal driver buffers until it sees a character to which convention assigns a special meaning. This is true of EOT, BS, CR and a host of others (see stty -a and man termios for alll the gory details).

read -N is an explicit order to just deliver the next N characters. To do that, the shell has to stop asking the terminal driver for canonical semantics.

By the way, EOF isn't actually a condition a terminal can set, or enter.

If you keep reading past eof on anything else, you'll keep getting the EOF indicator, but the only EOF the terminal driver can supply is a fake one—think about it—if the terminal driver actually delivered a real EOF, then the shell couldn't keep reading from it afterwards either. It's all the same terminal. Here:

#include <unistd.h>
#include <stdio.h>
char s[32];
int main(int c, char**v)
{
    do {
        c=read(0,s,sizeof s);
        printf("%d,%.*s\n",c,c,s);
    } while (c>=0);
}

try that on the terminal, you'll see that the terminal driver in canonical mode just interprets EOT to complete any outstanding read, and it buffers internally until it sees some canonical input terminator regardless of the read buffer size (type a line longer than 32 bytes).

The text that's confusing you¸

unless EOF is encountered

is referring to a real EOF.

Shell – What does POSIX require for quoted here documents inside command substitution

This was asked on Bash's mailing list, and the maintainer confirmed it was a bug

They also mentioned that the text in POSIX "is not necessarily ambiguous, but it does require close reading.", so I asked for a clarification on that. Their answer including a description of the issue and interpretation of the standard was as follows:

The command substitution is a red herring; it's relevant only in that it pointed out where the bug was.

The delimiter to the here-document is quoted, so the lines are not expanded. In this case, the shell reads lines from the input as if they were quoted. If a backslash appears in a context where it is quoted, it does not act as an escape character (see below), and the special handling of backslash-newline does not take place. In fact, if any part of the delimiter is quoted, the here-document lines are read as if single-quoted.

The text in Posix 2.2.1 is written awkwardly, but means that the backslash is only treated specially when it's not quoted. You can quote a backslash and inhibit all all expansion only with single quotes or another backslash.

The close reading part is the "not expanded" text implying the single quotes. The standard says in 2.2 that here documents are "another form of quoting," but the only form of quoting in which words are not expanded at all is single quotes. So it's a form of quoting that is just about exactly like single quotes, but not single quotes.

Best Answer

Related Solutions

Bash – How to Signal End-of-Input to ‘read -N’

Shell – What does POSIX require for quoted here documents inside command substitution

Related Question