Bash – printf: multibyte characters

bashcharacter encodingprintf

When trying to format printf output involving strings containing multi-byte characters, it became clear that printf does not count literal characters but the number of bytes, which makes formatting text difficult if single-byte and multi-byte characters are mixed. For example:

$ cat script
#!/bin/bash
declare -a a b
a+=("0")
a+=("00")
a+=("000")
a+=("0000")
a+=("00000")
b+=("0")
b+=("├─00")
b+=("├─000")
b+=("├─0000")
b+=("└─00000")
printf "%-15s|\n" "${a[@]}" "${b[@]}"

$ ./script
0              |
00             |
000            |
0000           |
00000          |
0              |
├─00       |
├─000      |
├─0000     |
└─00000    |

I found various suggested work-arounds (mainly wrappers using another language or utility to print the text). Are there any native bash solutions? None of the documented printf format strings appear to help. Would the locale settings be relevant in this situation, e.g., to use a fixed-width character encoding like UTF-32?

Best Answer

You could work around it by telling the terminal to move the cursor to the desired position, instead of having printf count the characters.:

$ printf "%s\033[10G-\n" "abc" "├─cd" "└──ef"
abc      -
├─cd     -
└──ef    -

Well, assuming you're printing to a terminal, that is...

The control sequence there is <ESC>[nnG where nn is the column to move to, in decimal.

Of course, if the first column is longer than the allocated space, the result isn't too nice:

$ printf "%s\033[10G-\n" "abcdefghijkl"
abcdefghi-kl

To work around that, you could explicitly clear the rest of the line (<ESC>[K) before printing the following column.

$ printf "%s\033[10G\033[K-\n" "abcdefghijkl"
abcdefghi-

Another way would be to do the padding manually, assuming we have something that can determine the length of the string in characters. This seems to work in Bash for simple characters, but is of course a bit ugly. Zero-width and double width characters will probably break it, and I didn't test combining characters either.

#!/bin/bash
pad() { 
    # parameters:
    #  1: name of variable to pad
    #  2: length to pad to
    local string=${!1}
    local len=${#string}
    printf -v "$1" "%s%$(($2 - len))s" "$string" ""
}
echo "1234567890"
for x in "abc" "├─cd" "└──ef" ; do
    pad x 9
    printf "%s-\n" "$x"
done

And the output is:

1234567890
abc      -
├─cd     -
└──ef    -

Related Solutions

Bash – Using Dashes in Printf

The -- is used to tell the program that whatever follows should not be interpreted as a command line option to printf.

Thus the printf "--" you tried basically ended up as "printf with no arguments" and therefore failed.

Bash prevent printf from interrupting another printf

In fact you need a mutex :

Each sub-shell will access concurrently to the same /dev/stdout of the parent shell, so you can not ensure the order even inside the same function. To ensure it you need a lock which enforce the mutual exclusion i.e. : all other processes won't start to write in /dev/stdout until the lock is released.

#!/bin/bash

function foo {
lockdir=/tmp/myscript.lock
 mkdir "$lockdir" 2>/dev/null
while [ $? -ne 0 ]; do mkdir "$lockdir" 2>/dev/null; done
printf "Test line break: $1\nafter line break: $1\n\n"
rm -rf $lockdir
}

for VARIABLE in {1..30}
do
  foo $VARIABLE &
done
wait

This will give this as a result :

$ bash plop1 2>/dev/null
Test line break: 5
after line break: 5

Test line break: 3
after line break: 3

Test line break: 11
after line break: 11

Test line break: 23
after line break: 23

Test line break: 14
after line break: 14

Test line break: 17
after line break: 17

Test line break: 24
after line break: 24

Test line break: 21
after line break: 21

Test line break: 27
after line break: 27

Test line break: 6
after line break: 6

Test line break: 2
after line break: 2

Test line break: 9
after line break: 9

Test line break: 26
after line break: 26

Test line break: 29
after line break: 29

Test line break: 20
after line break: 20

Test line break: 1
after line break: 1

Test line break: 12
after line break: 12

Test line break: 4
after line break: 4

Test line break: 13
after line break: 13

Test line break: 10
after line break: 10

Test line break: 15
after line break: 15

Test line break: 28
after line break: 28

Test line break: 25
after line break: 25

Test line break: 19
after line break: 19

Test line break: 18
after line break: 18

Test line break: 8
after line break: 8

Test line break: 7
after line break: 7

Test line break: 16
after line break: 16

Test line break: 22
after line break: 22

Test line break: 30
after line break: 30

Best Answer

Related Solutions

Bash – Using Dashes in Printf

Bash prevent printf from interrupting another printf

Related Question