It might be more helpful if the doc pointed out that there's no such thing as an ASCII EOF, that the ASCII semantics for ^D is EOT, which is what the terminal driver supplies in canonical mode: it ends the current transmission, the read
. Programs interpret a 0-length read as EOF, because that's what EOF looks like on files that have that, but the terminal driver refusing to deliver character code 4 and instead swallowing it and terminating the read isn't always what you want.
That's what's going on here: control character semantics are part of canonical mode, the mode where the terminal driver buffers until it sees a character to which convention assigns a special meaning. This is true of EOT, BS, CR and a host of others (see stty -a
and man termios
for alll the gory details).
read -N
is an explicit order to just deliver the next N characters. To do that, the shell has to stop asking the terminal driver for canonical semantics.
By the way, EOF isn't actually a condition a terminal can set, or enter.
If you keep reading past eof on anything else, you'll keep getting the EOF indicator, but the only EOF the terminal driver can supply is a fake one—think about it—if the terminal driver actually delivered a real EOF, then the shell couldn't keep reading from it afterwards either. It's all the same terminal. Here:
#include <unistd.h>
#include <stdio.h>
char s[32];
int main(int c, char**v)
{
do {
c=read(0,s,sizeof s);
printf("%d,%.*s\n",c,c,s);
} while (c>=0);
}
try that on the terminal, you'll see that the terminal driver in canonical mode just interprets EOT to complete any outstanding read, and it buffers internally until it sees some canonical input terminator regardless of the read buffer size (type a line longer than 32 bytes).
The text that's confusing you¸
unless EOF is encountered
is referring to a real EOF.
This was asked on Bash's mailing list, and the maintainer confirmed it was a bug
They also mentioned that the text in POSIX "is not necessarily ambiguous, but it does require close reading.", so I asked for a clarification on that. Their answer including a description of the issue and interpretation of the standard was as follows:
The command substitution is a red herring; it's relevant only in that it pointed out where the bug was.
The delimiter to the here-document is quoted, so the lines are not expanded. In this case, the shell reads lines from the input as if they were quoted. If a backslash appears in a context where it is quoted, it does not act as an escape character (see below), and the special handling of backslash-newline does not take place. In fact, if any part of the delimiter is quoted, the here-document lines are read as if single-quoted.
The text in Posix 2.2.1 is written awkwardly, but means that the backslash is only treated specially when it's not quoted. You can quote a backslash and inhibit all all expansion only with single quotes or another backslash.
The close reading part is the "not expanded" text implying the single quotes. The standard says in 2.2 that here documents are "another form of quoting," but the only form of quoting in which words are not expanded at all is single quotes. So it's a form of quoting that is just about exactly like single quotes, but not single quotes.
Best Answer
The
read
command reads from its standard input stream and assigns what's read to the variablefile
(it's a bit more compicated than that, see long discussion here). The standard input stream is coming from the here-document redirected into the loop afterdone
. If not given data from anywhere, it will read from the terminal, interactively. In this case though, the shell has arranged to connect its input stream to the here-document.while read
will cause the loop to iterate until theread
command returns a non-zero exit status. This will happen if there are any errors, or (most commonly) when there is no more data to be read (its input stream is in an end-of-file state).The convention is that any utility that wishes to signal an error or "false" or "no" to the calling shell does so by returning a non-zero exit status. A zero exit status signals "true" or "yes" or "no error". This status, would you wish to inspect it, is available in
$?
(only from the last executed utility). The exit status may be used inif
statements andwhile
loops or anywhere where a test is required. For exampleA here-document is a form of redirection. In this case, it's a redirection into the loop. Anything inside the loop could read from it but in this case it's only the
read
command that does. Do read up on here-documents. If the input was coming from an ordinary file, the last line would have beenSeeing the loop as one single command may make this more intuitive:
which is one case of
Some shells also supports "here-strings" with
<<<"string"
:DavidFoerster points out that if any of the two scripts
x.sh
andy.sh
reads from standard input, without explicitly being given data to read from a file or from elsewhere, the data read will actually come from the here-document.With a
x.sh
that contains onlyread a
, this would make the variablea
contain the stringy.sh
, and they.sh
script would never run. This is due to the fact that the standard input is redirected for all commands in thewhile
loop (and also "inherited" by any invoked script or command) and the second line is "consumed" byx.sh
before thewhile
loop'sread
can read it.If this behaviour is unwanted, it can be avoided, but it's a bit tricky.
It fails because there is no
;
or newline beforedone
. Without;
or newline beforedone
, the worddone
will be taken as an argument ofsource
, and the loop will additionally not be properly closed (this is a syntax error).It is almost true that any
;
may be replaced by a newline (at least when it's a command delimiter). It signals the end of a command, as does|
,&
,&&
and||
(and probably others that I have forgotten).