text-processing,ascii,od – How Octal 2-Byte Output is Calculated from od

asciiodtext processing

I'm struggling to work out what the octal 2-byte output from the od command is. I understand the octal output (-b flag) but the octal 2-byte is a mystery to me (-o)

Can someone shed some light on how the -o result is calculated from ASCII?

Here is an example:

[root@localhost lpi103-2]# cat text1
1 apple
2 pear
3 banana
[root@localhost lpi103-2]# od -c text1
0000000   1       a   p   p   l   e  \n   2       p   e   a   r  \n   3
0000020       b   a   n   a   n   a  \n
0000030
[root@localhost lpi103-2]# od -bc text1
0000000 061 040 141 160 160 154 145 012 062 040 160 145 141 162 012 063
          1       a   p   p   l   e  \n   2       p   e   a   r  \n   3
0000020 040 142 141 156 141 156 141 012
              b   a   n   a   n   a  \n
0000030
[root@localhost lpi103-2]# od -oc text1
0000000  020061  070141  066160  005145  020062  062560  071141  031412
          1       a   p   p   l   e  \n   2       p   e   a   r  \n   3
0000020  061040  067141  067141  005141
              b   a   n   a   n   a  \n
0000030

Best Answer

For ~~hysterical~~ historical reasons, od prints two-byte words¹ by default.

The number 020061 (octal) corresponds to the two-byte sequence 1␣ (␣ is a space character). Why? It's clearer if you use hexadecimal: 0o20061 = 0x2031, and ␣ is 0x20 (32) in ASCII and 1 is 0x31 (49). Notice that the lower-order bits (0x31) correspond to the first character and the higher-order bits correspond to the second character: od is assembling the words in little-endian order, because that happens to be your system's endianness.²

Little-endian order is not very natural here because one of the output formats (-c) prints characters, the other one (-o) prints words. Each word is printed as a number in the usual big-endian notation (the most significant digit comes first in our left-to-right reading order). This is even more apparent in hexadecimal where the byte boundaries are clearly apparent in the numerical output:

echo '1 text' | od -xc   
0000000 2031 6574 7478 000a
         1    t e  x t \n\0

If you prefer to view the file as a sequence of bytes, use od -t x1 (or hd if you have it).

¹ _{Once upon a time, men were real men, computers were real computers, numbers were often written in octal, and words were two bytes long.}

² _{All PCs (x86, x86-64) are little-endian, as was the PDP-11 where Unix started. ARM CPUs can cope with either endianness but Linux and iOS use it in little-endian mode. So most of the platforms you're likely to encounter nowadays are little-endian.}

Best Answer

Related Solutions

How to interpret an octal or hex dump of a binary file

Why is the unit separator (ASCII 31) invisible in terminal output

Related Question