How does unexpand command really work

coreutilstabulation

I've already read some explananations about unexpand, but either I don't understand, or is not working as expected.

Let's consider the following example:

[root@hope log]# echo "A12345678B" | tr '[1-8]' ' ' | unexpand -a
A        B
[root@hope log]# echo "A12345678B" | tr '[1-8]' ' ' | unexpand -a | od -ta
0000000   A  ht  sp   B  nl
0000005
[root@hope log]# echo "A12345678B12345678C" | tr '[1-8]' ' ' | unexpand -a | od -ta
0000000   A  ht  sp   B  ht  sp  sp   C  nl
0000011
[root@hope log]# echo "12345678" | tr '[1-8]' ' ' | unexpand -a | od -ta
0000000  ht  nl
0000002

I see that it replaces 8 blanks with one tab, but it appends one more space each time it appears some non-blank character.

Using bash-4.3.42-3.fc23.x86_64 and coreutils-8.24-6.fc23.x86_64

Please could you explain this behaviour?

Best Answer

The unexpand program does not simply replace 8 spaces with a tab. It replaces spaces and tabs in the line with the assumption that a tab causes the terminal used to display the line to move to the next tabstop. Normally those are at intervals of 8 spaces, but for most terminals the interval can be changed (and the interval for each stop can be changed).

The first example string "A12345678B" replaces the digits 1-8 with spaces. The first 8 characters in the result are one tab interval. That leaves a space (where the 8 was) at the first tabstop. The unexpand program does not add a space; that is left over after unexpand replaces the spaces in 1-7 with a tab.

One would use unexpand to convert a file containing mostly spaces (or a mixture of spaces and tabs) to a consistent format using tabs. A file with many lines beginning with spaces can be much larger than one using tabs for the same reason (indenting). Also, it is used to convert a file to use different tab stops, e.g., for a table whose columns align with one set of tab stops, making it wider or narrower.

A conversion between different tab intervals might be done like this:

expand -t 1,6,11,16,21 foo | unexpand -t 1,9,17,25,33 >bar

Besides tab stops set in the terminal, some programs (such as vi) can display text with different tab intervals.

Further reading

Related Question