Command Line Files Binary – How to Gather Byte Occurrence Statistics in Binary File

binarycommand linefilesstatistics

I'd like to know the equivalent of

cat inputfile | sed 's/\(.\)/\1\n/g' | sort | uniq -c

presented in https://stackoverflow.com/questions/4174113/how-to-gather-characters-usage-statistics-in-text-file-using-unix-commands for production of character usage statistics in text files for binary files counting simple bytes instead of characters, i.e. output should be in the form of

18383 57
12543 44
11555 127
 8393 0

It doesn't matter if the command takes as long as the referenced one for characters.

If I apply the command for characters to binary files the output contains statistics for arbitrary long sequences of unprintable characters (I don't seek explanation for that).

Best Answer

With GNU od:

od -vtu1 -An -w1 my.file | sort -n | uniq -c

Or more efficiently with perl (also outputs a count (0) for bytes that don't occur):

perl -ne 'BEGIN{$/ = \4096};
          $c[$_]++ for unpack("C*");
          END{for ($i=0;$i<256;$i++) {
              printf "%3d: %d\n", $i, $c[$i]}}' my.file
Related Question