I'm struggling to work out what the octal 2-byte output from the od
command is. I understand the octal output (-b
flag) but the octal 2-byte is a mystery to me (-o
)
Can someone shed some light on how the -o
result is calculated from ASCII?
Here is an example:
[root@localhost lpi103-2]# cat text1
1 apple
2 pear
3 banana
[root@localhost lpi103-2]# od -c text1
0000000 1 a p p l e \n 2 p e a r \n 3
0000020 b a n a n a \n
0000030
[root@localhost lpi103-2]# od -bc text1
0000000 061 040 141 160 160 154 145 012 062 040 160 145 141 162 012 063
1 a p p l e \n 2 p e a r \n 3
0000020 040 142 141 156 141 156 141 012
b a n a n a \n
0000030
[root@localhost lpi103-2]# od -oc text1
0000000 020061 070141 066160 005145 020062 062560 071141 031412
1 a p p l e \n 2 p e a r \n 3
0000020 061040 067141 067141 005141
b a n a n a \n
0000030
Best Answer
For
hystericalhistorical reasons,od
prints two-byte words¹ by default.The number 020061 (octal) corresponds to the two-byte sequence
1␣
(␣
is a space character). Why? It's clearer if you use hexadecimal: 0o20061 = 0x2031, and␣
is 0x20 (32) in ASCII and1
is 0x31 (49). Notice that the lower-order bits (0x31) correspond to the first character and the higher-order bits correspond to the second character: od is assembling the words in little-endian order, because that happens to be your system's endianness.²Little-endian order is not very natural here because one of the output formats (
-c
) prints characters, the other one (-o
) prints words. Each word is printed as a number in the usual big-endian notation (the most significant digit comes first in our left-to-right reading order). This is even more apparent in hexadecimal where the byte boundaries are clearly apparent in the numerical output:If you prefer to view the file as a sequence of bytes, use
od -t x1
(orhd
if you have it).¹ Once upon a time, men were real men, computers were real computers, numbers were often written in octal, and words were two bytes long.
² All PCs (x86, x86-64) are little-endian, as was the PDP-11 where Unix started. ARM CPUs can cope with either endianness but Linux and iOS use it in little-endian mode. So most of the platforms you're likely to encounter nowadays are little-endian.