Bash – How to Signal End-of-Input to ‘read -N’

bashkshreadttyuser input

I've been trying to figure out why I get a literal end-of-transmission character (EOT, ASCII code 4) in my variable if I read Ctrl+D with read -N 1 in bash and ksh93.

I'm aware of the distinction between the end-of-transmission character and the end-of-file condition, and I know what Ctrl+D does when using read without -N (it sends EOT, and if the input was empty, the underlying read() returns zero, signalling EOF).

But I'm not sure why trying to read a specific number of characters changes this behaviour so radically. I would have expected an EOF condition and that the following loop would exit:

while read -N 1 ch; do
  printf '%s' "$ch" | od
done

Output when pressing Ctrl+D:

0000000  000004
0000001

The bash manual says about read -N (ksh93 has a similar wording):

-N nchars;
read returns after reading exactly nchars characters
rather than waiting for a complete line of input, unless
EOF is encountered
or read times out.

… but it says nothing about switching the TTY to raw/unbuffered mode (which is what I assume is happening).

The -n option to read seems to work in the same way with regards to Ctrl+D, and the number of characters to read doesn't seem to matter either.

How may I signal an end-of-input to read -N and exit the loop (other than testing the value that was read), and why is this different from a "bare" read?

Best Answer

It might be more helpful if the doc pointed out that there's no such thing as an ASCII EOF, that the ASCII semantics for ^D is EOT, which is what the terminal driver supplies in canonical mode: it ends the current transmission, the read. Programs interpret a 0-length read as EOF, because that's what EOF looks like on files that have that, but the terminal driver refusing to deliver character code 4 and instead swallowing it and terminating the read isn't always what you want.

That's what's going on here: control character semantics are part of canonical mode, the mode where the terminal driver buffers until it sees a character to which convention assigns a special meaning. This is true of EOT, BS, CR and a host of others (see stty -a and man termios for alll the gory details).

read -N is an explicit order to just deliver the next N characters. To do that, the shell has to stop asking the terminal driver for canonical semantics.

By the way, EOF isn't actually a condition a terminal can set, or enter.

If you keep reading past eof on anything else, you'll keep getting the EOF indicator, but the only EOF the terminal driver can supply is a fake one—think about it—if the terminal driver actually delivered a real EOF, then the shell couldn't keep reading from it afterwards either. It's all the same terminal. Here:

#include <unistd.h>
#include <stdio.h>
char s[32];
int main(int c, char**v)
{
    do {
        c=read(0,s,sizeof s);
        printf("%d,%.*s\n",c,c,s);
    } while (c>=0);
}

try that on the terminal, you'll see that the terminal driver in canonical mode just interprets EOT to complete any outstanding read, and it buffers internally until it sees some canonical input terminator regardless of the read buffer size (type a line longer than 32 bytes).

The text that's confusing you¸

unless EOF is encountered

is referring to a real EOF.

Related Question