Bash Shell – How to Capture ‘\n’ (Newline) Character in Bash Read?

bashshell

Can I use read to capture the \n \012 or newline character?

Define test function:

f() { read -rd '' -n1 -p "Enter a character: " char &&
      printf "\nYou entered: %q\n" "$char"; }

Run the function, press Enter:

$ f;
Enter a character: 

You entered: ''

Hmmm. It's a null string.

How do I get my expected output:

$ f;
Enter a character:

You entered: $'\012'
$

I want the same method to be able to capture ^D or \004.

If read can't do it, what is the work around?

Best Answer

To read 1 character, use -N instead, which reads one character always and doesn't do $IFS processing:

read -rN1 var

read -rn1 reads one record up to one character and still does $IFS processing (and newline is in the default value of $IFS which explains why you get an empty result even though you read NUL-delimited records). You'd use it instead to limit the length of the record you read.

In your case, with NUL-delimited records (with -d ''), IFS= read -d '' -rn1 var would work the same as bash cannot store a NUL character in its variables anyway, so printf '\0' | read -rN1 var would leave $var empty and return a non-zero exit status.

To be able to read arbitrary characters including NUL, you'd use the zsh shell instead where the syntax is:

read -k1 'var?Enter a character: '

(no need for -r or IFS= there. However note that read -k reads from the terminal (k is for key; zsh's -k option predates bash's and even ksh93's -N by decades). To read from stdin, use read -u0 -k1).

Example (here pressing Ctrl+Space to enter a NUL character):

$ read -k1 'var?Enter a character: '
Enter a character: ^@
$ printf '%q\n' $var
$'\0'

Note that to be able to read a character, read may have to read more than one byte. If the input starts with the first byte of multi-byte character, it will read at least one more byte, so you could end up with $var containing something that the shell considers having a length greater than 1 if the input contains byte sequences not forming valid characters.

For instance in a UTF-8 locale:

$ printf '\xfc\x80\x80\x80\x80XYZ' | bash -c 'read -rN1 a; printf "<%q>\n" "$a" "${#a}"; wc -c'
<$'\374\200\200\200\200X'>
<6>
2
$ printf '\xfc\x80\x80\x80\x80XYZ' | zsh -c 'read -u0 -k1 a; printf "<%q>\n" $a $#a; wc -c'
<$'\374'$'\200'$'\200'$'\200'$'\200'X>
<6>
2

In UTF-8, 0xFC is the first byte of a 6-byte long character, the 5 other ones meant to have the 8th bit set and the 7th bit unset, however we provide only 4. read still reads that extra X to try and find the end of the character which ends up in $var along with those 5 bytes that don't form a valid character and end up being counted as one character each.

Related Solutions

Bash – Reading Character by Character with Read

You need to remove whitespace characters from the $IFS parameter for read to stop skipping leading and trailing ones (with -n1, the whitespace character if any would be both leading and trailing, so skipped):

while IFS= read -rn1 a; do printf %s "$a"; done

But even then bash's read will skip newline characters, which you can work around with:

while IFS= read -rn1 a; do printf %s "${a:-$'\n'}"; done

Though you could use IFS= read -d '' -rn1 instead or even better IFS= read -N1 (added in 4.1, copied from ksh93 (added in o)) which is the command to read one character.

Note that bash's read can't cope with NUL characters. And ksh93 has the same issues as bash.

With zsh:

while read -ku0 a; do print -rn -- "$a"; done

(zsh can cope with NUL characters).

Note that those read -k/n/N read a number of characters, not bytes. So for multibyte characters, they may have to read multiple bytes until a full character is read. If the input contains invalid characters, you may end up with a variable that contains a sequence of bytes that doesn't form valid characters and which the shell may end up counting as several characters. For instance in a UTF-8 locale:

$ printf '\375\200\200\200\200ABC' | bash -c '
    IFS= read  -rN1 a; echo "${#a}"'
6

That \375 would introduce a 6-byte UTF-8 character. However, the 6th one (A) above is invalid for a UTF-8 character. You still end-up with \375\200\200\200\200A in $a, which bash counts as 6 characters though the first 5 ones are not really characters, just 5 bytes not forming part of any character.

Bash Shell Newlines – How to Preserve Newline Character in Command Output

It is a known flaw of "command expansion" $(...) or `...` that the last newline is trimmed.

If that is your case:

$ output="$(head -- "$file"; echo x)"     ### capture the text with an x added.
$ output="${output%?}"                    ### remove the last character (the x).

Will correct the value of output.

Best Answer

Related Solutions

Bash – Reading Character by Character with Read

Bash Shell Newlines – How to Preserve Newline Character in Command Output

Related Question