Command Line Files Binary – How to Gather Byte Occurrence Statistics in Binary File

binarycommand linefilesstatistics

I'd like to know the equivalent of

cat inputfile | sed 's/\(.\)/\1\n/g' | sort | uniq -c

presented in https://stackoverflow.com/questions/4174113/how-to-gather-characters-usage-statistics-in-text-file-using-unix-commands for production of character usage statistics in text files for binary files counting simple bytes instead of characters, i.e. output should be in the form of

It doesn't matter if the command takes as long as the referenced one for characters.

If I apply the command for characters to binary files the output contains statistics for arbitrary long sequences of unprintable characters (I don't seek explanation for that).

Best Answer

With GNU od:

od -vtu1 -An -w1 my.file | sort -n | uniq -c

Or more efficiently with perl (also outputs a count (0) for bytes that don't occur):

perl -ne 'BEGIN{$/ = \4096};
          $c[$_]++ for unpack("C*");
          END{for ($i=0;$i<256;$i++) {
              printf "%3d: %d\n", $i, $c[$i]}}' my.file

Related Solutions

Grep – Split Binary Data by Fixed Byte Offset

You can operate on the binary file without needing to go through xxd. I ran your data back through xxd and used grep -b to show me the byte offsets of your pattern (converted from hex to chars \xfa) in the binary file.

I removed with sed the matched characters from the output to leave just the numbers. I then set the shell positional args to the resulting offsets (set -- ...)

xxd -r -p <data26.6.2015.txt >/tmp/f1
set -- $(grep -b -a -o -P '\xfa\xfa\xfa\xfa' /tmp/f1 | sed 's/:.*//')

You now have a list of offsets in $1, $2, ... You can then extract the part that interests you with dd, setting a block size to 1 (bs=1) so that it reads byte by byte. skip= says how many bytes to skip in the input, and count= the number of bytes to copy.

start=$1 end=$2
let count=$end-$start
dd bs=1 count=$count skip=$start </tmp/f1 >/tmp/f2

The above extracts from the start of the 1st pattern to just before the 2nd pattern. To not include the pattern, you can add 4 to start (and count reduces by 4).

If you want to extract all parts, use a loop around this same code, and add starting offset 0 and ending offset size-of-file to the list of numbers:

xxd -r -p <data26.6.2015.txt >/tmp/f1
size=$(stat -c '%s' /tmp/f1)
set -- 0 $(grep -b -a -o -P '\xfa\xfa\xfa\xfa' /tmp/f1 | sed 's/:.*//') $size
i=2
while [ $# -ge 2 ]
do start=$1 end=$2
   let count=$end-$start
   dd bs=1 count=$count skip=$start </tmp/f1 >/tmp/f$i
   let i=i+1
   shift
done

If grep doesnt manage to work with the binary data, you can use the xxd hex dump data. First remove all the newlines to have one enormous line, then do the grep using the unescaped hex values, but then divide all the offsets by 2, and do the dd with the raw file:

xxd -r -p <data26.6.2015.txt >r328.raw
tr -d '\n' <data26.6.2015.txt >f1
let size2=2*$(stat -c '%s' f1)
set -- 0 $(grep -b -a -o -P 'fafafafa' f1 | sed 's/:.*//') $size2
i=2
while [ $# -ge 2 ]
do  let start=$1/2
    let end=$2/2
    let count=$end-$start
    dd bs=1 count=$count skip=$start <r328.raw  >f$i
    let i=i+1
    shift
done

How to Gather Full Network Usage Statistics on FreeBSD Router

For getting details of network transactions, you have got a implementation of a Netflow generator for FreeBSD or Linux:

ng_netflow

NAME ng_netflow - Cisco's NetFlow implementation

DESCRIPTION The ng_netflow node implements Cisco's NetFlow export protocol on a router running FreeBSD. The ng_netflow node listens for incoming traffic and identifies unique flows in it. Flows are distinguished by endpoint IP addresses, TCP/UDP port numbers, ToS and input interface. Expired flows are exported out of the node in NetFlow version 5/9 UDP datagrams.

As for NetFlow itself:

NetFlow is a network protocol developed by Cisco for collecting IP traffic information and monitoring network traffic. By analyzing flow data, a picture of network traffic flow and volume can be built.

also rfc 3954 - NetFlow Services Export Version 9

For storing the Netflow data you also need what is know as a server collector. It can be either a Linux or a FreeBSD box. It should not be installed on the actual router. One such known implementation is nfsen

NfSen is a graphical web based front end for the nfdump netflow tools.

NfSen allows you to:
- Display your netflow data: Flows, Packets and Bytes using RRD (Round Robin Database).
- Easily navigate through the netflow data.
- Process the netflow data within the specified time span.
- Create history as well as continuous profiles.
- Set alerts, based on various conditions.
- Write your own plugins to process netflow data on a regular interval.

Be aware that, depending on your available bandwidth, generating NetFlows can be taxing on the CPU. A known strategy in some cases is doing a mirror of the switch port of the router, and using another machine for those operations.

After a certain threshold of bandwidth it probably makes more sense going for a professional router if generating NetFlows is a requirement.

As a final alert, having NAT, the NetFlows have to be captured in the inside/LAN interface, as otherwise you will lose the sense of whom is doing what.

I use up around 100GB of data for 5-6 months of traffic, using NfSen collecting NetFlow data from Cisco equipment, your mileage may vary.

Best Answer

Related Solutions

Grep – Split Binary Data by Fixed Byte Offset

How to Gather Full Network Usage Statistics on FreeBSD Router

Related Question