Bash – How to use bash script to read binary file content

bashbinarytext processing

I want to read a character and then a fixed length of string (the string is not null terminated in the file, and its length is given by the preceding character).

How can I do this in a bash script? How to define the string variable so that I can do some post-processing on it?

Best Answer

If you want to stick with shell utilities, you can use head to extract a number of bytes, and od to convert a byte into a number.

export LC_ALL=C    # make sure we aren't in a multibyte locale
n=$(head -c 1 | od -An -t u1)
string=$(head -c $n)

However, this does not work for binary data. There are two problems:

  • Command substitution $(…) strips final newlines in the command output. There's a fairly easy workaround: make sure the output ends in a character other than a newline, then strip that one character.

    string=$(head -c $n; echo .); string=${string%.}
    
  • Bash, like most shells, is bad at dealing with null bytes. As of bash 4.1, null bytes are simply dropped from the result of the command substitution. Dash 0.5.5 and pdksh 5.2 have the same behavior, and ATT ksh stops reading at the first null byte. In general, shells and their utilities aren't geared towards dealing with binary files. (Zsh is the exception, it's designed to support null bytes.)

If you have binary data, you'll want to switch to a language like Perl or Python.

<input_file perl -e '
  read STDIN, $c, 1 or die $!;    # read length byte
  $n = read STDIN, $s, ord($c);   # read data
  die $! if !defined $n;
  die "Input file too short" if ($n != ord($c));
  # Process $s here
'
<input_file python -c '
  import sys
  n = ord(sys.stdin.read(1))      # read length byte
  s = sys.stdin.read(n)           # read data
  if len(s) < n: raise ValueError("input file too short")
  # Process s here
'