BASH print question (printf \\$(printf ‘%03o’ $1))

asciibashlinuxprintf

I used the following to convert from int to char and char to int in bash.
But I do not understand how printf \\$(printf '%03o' $1) or printf '%d' "'$1" work. Please explain how printf \\$(printf '%03o' $1) and printf '%d' work.

#!/bin/bash
# chr() - converts decimal value to its ASCII character representation
# ord() - converts ASCII character to its decimal value

chr() {
  printf \\$(printf '%03o' $1)
}

ord() {
  printf '%d' "'$1"
}

ord A
echo
chr 65
echo

Best Answer

printf '\101' where 101 is a an octal number outputs the byte with that value.

When sent to an ASCII terminal, that will be rendered as A as A is character 65 (octal 101) in ASCII and all ASCII-compatible character sets (which includes most modern charsets with the exception of the EBCDIC ones still used on some IBM systems).

printf \\$(printf '%03o' $1)

Which should have been written:

printf "\\$(printf '%03o' "$1")"

as leaving parameter expansions (like $1), or command substitution ($(...)) unquoted is the split+glob operator in Bourne-like shells which is not wanted here

printf '%03o' "$1" converts the number in $1 to a 3 digit octal
printf "\\$(...)" appends that octal to a \ (\\ inside double quotes becomes \) and passes that to printf so it will output the corresponding byte value.

Note that it only works in locales where the charset is one byte per character (like iso8859-1) or, in locales with a multi-byte charset, only for values 0 to 127.

In bash,

printf '%d\n' "'A"

prints the Unicode code-point of character A (or at least the value returned by mbtowc() which on GNU systems at least is the Unicode code-point).

Some other implementations (including the standalone GNU printf utility) instead return the value of the first byte of the character.

For ASCII characters like A and on ASCII-based systems, that doesn't make any difference, but for others it matters. For instance the Greek α character (U+03B1) is encoded as:

byte 225 in iso8859-7 (the standard Greek single-byte charset)
bytes 206 177 in UTF-8 (the most commonly used encoding of Unicode on Unix-like systems)
bytes 166 193 in GB18030 (the official Chinese encoding of Unicode).

Bash's printf '%d\n' "'α" will always output 945 (0x03b1 in hexadecimal), which is the Unicode code point of α regardless of the locale (at least on GNU systems), but others may return 225, 206 or 166 depending on the locale.

You can see from that those chr and ord are only the reverse of each other for ASCII characters (or values 0 to 127), or in locales using the iso8859-1 character set for all characters (values 0 to 255).

If ord() is meant to return the Unicode code point, then the reverse (print the character corresponding to a Unicode code point) would be:

chr() {
  printf "\U$(printf %08X "$1")"
}

(assuming bash 4.3 or above (\UXXXXXXXX was added in 4.2, but didn't work properly for characters U+0080 to U+00FF until 4.3)).

Then, in any locale:

$ ord α
945
$ chr 945
α

Or for ord() to return the values of the bytes of the encoding of a given character (in the current locale):

ord() {
  printf %s "$1" | od -An -vtu1
}

And for chr() to output those bytes:

chr() {
  printf "$(printf '\\%o' "$@")"
}

Then, in a UTF-8 locale for instance:

$ ord α
 206 177
$ chr 206 177
α

(your ord α would give 945, your chr would give garbage for both chr 945 and chr 206 177).

Or in a locale using iso8859-7:

$ ord α
 225
$ chr 225
α

(your ord α would give 945, though could give 225 if printf was replaced with /usr/bin/printf if on a GNU system).

Related Solutions

Bash – Reading Character by Character with Read

You need to remove whitespace characters from the $IFS parameter for read to stop skipping leading and trailing ones (with -n1, the whitespace character if any would be both leading and trailing, so skipped):

while IFS= read -rn1 a; do printf %s "$a"; done

But even then bash's read will skip newline characters, which you can work around with:

while IFS= read -rn1 a; do printf %s "${a:-$'\n'}"; done

Though you could use IFS= read -d '' -rn1 instead or even better IFS= read -N1 (added in 4.1, copied from ksh93 (added in o)) which is the command to read one character.

Note that bash's read can't cope with NUL characters. And ksh93 has the same issues as bash.

With zsh:

while read -ku0 a; do print -rn -- "$a"; done

(zsh can cope with NUL characters).

Note that those read -k/n/N read a number of characters, not bytes. So for multibyte characters, they may have to read multiple bytes until a full character is read. If the input contains invalid characters, you may end up with a variable that contains a sequence of bytes that doesn't form valid characters and which the shell may end up counting as several characters. For instance in a UTF-8 locale:

$ printf '\375\200\200\200\200ABC' | bash -c '
    IFS= read  -rN1 a; echo "${#a}"'
6

That \375 would introduce a 6-byte UTF-8 character. However, the 6th one (A) above is invalid for a UTF-8 character. You still end-up with \375\200\200\200\200A in $a, which bash counts as 6 characters though the first 5 ones are not really characters, just 5 bytes not forming part of any character.

Bash Shell – Precedence of Logical Operators && and ||

In many computer languages, operators with the same precedence are left-associative. That is, in the absence of grouping structures, leftmost operations are executed first. Bash is no exception to this rule.

This is important because, in Bash, && and || have the same precedence.

So what happens in your example is that the leftmost operation (||) is carried out first:

true || echo aaa

Since true is obviously true, the || operator short-circuits and the whole statement is considered true without the need to evaluate echo aaa as you would expect. Now it remains to do the rightmost operation:

(...) && echo bbb

Since the first operation evaluated to true (i.e. had a 0 exit status), it's as if you're executing

true && echo bbb

so the && will not short-circuit, which is why you see bbb echoed.

You would get the same behavior with

false && echo aaa || echo bbb

Notes based on the comments

You should note that the left-associativity rule is only followed when both operators have the same precedence. This is not the case when you use these operators in conjunction with keywords such as [[...]] or ((...)) or use the -o and -a operators as arguments to the test or [ commands. In such cases, AND (&& or -a) takes precedence over OR (|| or -o). Thanks to Stephane Chazelas' comment for clarifying this point.
It seems that in C and C-like languages && has higher precedence than || which is probably why you expected your original construct to behave like
```
true || (echo aaa && echo bbb). 
```
This is not the case with Bash, however, in which both operators have the same precedence, which is why Bash parses your expression using the left-associativity rule. Thanks to Kevin's comment for bringing this up.
There might also be cases where all 3 expressions are evaluated. If the first command returns a non-zero exit status, the || won't short circuit and goes on to execute the second command. If the second command returns with a zero exit status, then the && won't short-circuit as well and the third command will be executed. Thanks to Ignacio Vazquez-Abrams' comment for bringing this up.

Best Answer

Related Solutions

Bash – Reading Character by Character with Read

Bash Shell – Precedence of Logical Operators && and ||

Related Question