Shell Read – How to Differentiate Between EOF and Newline in Shell Read?

bashposixshell

Reading a single character, how can I tell the difference between the null <EOF> and \n?

Eg:

f() { read -rn 1 -p "Enter a character: " char &&
      printf "\nYou entered '%s'\n" "$char"; }

With a printable character:

$ f
Enter a character: x
You entered 'x'

When pressing Enter:

$ f
Enter a character: 

You entered ''

When pressing Ctrl + D:

$ f
Enter a character: ^D
You entered ''
$ 

Why is the output the same in the last two cases? How can I distinguish between them?

Is there a different way to do this in POSIX shell vs bash?

Best Answer

With read -n "$n" (not a POSIX feature), and if stdin is a terminal device, read puts the terminal out of the icanon mode, as otherwise read would only see full lines as returned by the terminal line discipline internal line editor and then reads one byte at a time until $n characters or a newline have been read (you may see unexpected results if invalid characters are entered).

It reads up to $n character from one line. You'll also need to empty $IFS for it not to strip IFS characters from the input.

Since we leave the icanon mode, ^D is no longer special. So if you press Ctrl+D, the ^D character will be read.

You wouldn't see eof from the terminal device unless the terminal is somehow disconnected. If stdin is another type of file, you may see eof (like in : | IFS= read -rn 1; echo "$?" where stdin is an empty pipe, or with redirecting stdin from /dev/null)

read will return 0 if $n characters (bytes not forming part of valid characters being counted as 1 character) or a full line have been read.

So, in the special case of only one character being requested:

if IFS= read -rn 1 var; then
  if [ "${#var}" -eq 0 ]; then
    echo an empty line was read
  else
    printf %s "${#var} character "
    (export LC_ALL=C; printf '%s\n' "made of ${#var} byte(s) was read")
  fi
else
  echo "EOF found"
fi

Doing it POSIXly is rather complicated.

That would be something like (assuming an ASCII-based (as opposed to EBCDIC for instance) system):

readk() {
  REPLY= ret=1
  if [ -t 0 ]; then
    saved_settings=$(stty -g)
    stty -icanon min 1 time 0 icrnl
  fi
  while true; do
    code=$(dd bs=1 count=1 2> /dev/null | od -An -vto1 | tr -cd 0-7)
    [ -n "$code" ] || break
    case $code in
      000 | 012) ret=0; break;; # can't store NUL in variable anyway
      (*) REPLY=$REPLY$(printf "\\$code");;
    esac
    if expr " $REPLY" : ' .' > /dev/null; then
      ret=0
      break
    fi
  done
  if [ -t 0 ]; then
    stty "$saved_settings"
  fi
  return "$ret"
}

Note that we return only when a full character has been read. If the input is in the wrong encoding (different from the locale's encoding), for instance if your terminal sends é encoded in iso8859-1 (0xe9) when we expect UTF-8 (0xc3 0xa9), then you may enter as many é as you like, the function will not return. bash's read -n1 would return upon the second 0xe9 (and store both in the variable) which is a slightly better behaviour.

If you also wanted to read a ^C character upon Ctrl+C (instead of letting it kill your script; also for ^Z, ^\...), or ^S/^Q upon Ctrl+S/Q (instead of flow control), you could add a -isig -ixon to the stty line. Note that bash's read -n1 doesn't do it either (it even restores isig if it was off).

That will not restore the tty settings if the script is killed (like if you press Ctrl+C. You could add a trap, but that would potentially override other traps in the script.

You could also use zsh instead of bash, where read -k (which predates ksh93 or bash's read -n/-N) reads one character from the terminal and handles ^D by itself (returns non-zero if that character is entered) and doesn't treat newline specially.

if read -k k; then
  printf '1 character entered: %q\n' $k
fi
Related Question