Grep – Split Binary Data by Fixed Byte Offset

binarygrepsplitxxd

I have binary data which I review by xxd -ps in hex format.
I notice that the byte distance is 48300 (=805*60) bytes between two headers where the separator is fafafafa.
There is the beginning of the file which should be skipped.

Example hex data where 48300 bytes between headers fafafafa which you can get here called data26.6.2015.txt where three headers and its nearly equivalent binary here called test_27.6.2015.bin which has only first two headers.
In both files, the data of last header is not of complete length; otherwise, you can assume that the byte offset is fixed i.e. the length of data between headers.

Pseudocode of algorithm

look header end position
look first two header positions and set the difference of these positions (d2 – d1) the distance between events; this event length is the fixed (777)
split data by byte position (777) – TODO should I split binary format or as xxd -ps converted data? by byte position (777)

I can convert data back to binary by xxd -r like xxd -ps | split and store | xxd -r but I am still unsure if this is necessary.

In which stage can you split binary data?
Only in xxd -ps converted format or as binary data.

If splitting in xxd -ps converted format, I think for loop is to only way then go through the file.
Possible tools for splitting csplit, split, …, not sure.
However, I am uncertain.

Output from grep (ggrep is gnu grep) on the hex data

$ xxd -ps r328.raw  | ggrep -b -a -o -P 'fafa' | head
49393:fafa
49397:fafa
98502:fafa
98506:fafa
147611:fafa
147615:fafa
196720:fafa
196725:fafa
245830:fafa
245834:fafa

while doing the similar grep in the binary file giving emptyline only as an output.

$ ggrep -b -a -o '\xfa' r328.raw

Documentation

Documentation given to me is found here and here as a picture the general SRS data format:

enter image description here

In which stage can you split binary data (as binary data or as xxd -ps converted data)?

Best Answer

You can operate on the binary file without needing to go through xxd. I ran your data back through xxd and used grep -b to show me the byte offsets of your pattern (converted from hex to chars \xfa) in the binary file.

I removed with sed the matched characters from the output to leave just the numbers. I then set the shell positional args to the resulting offsets (set -- ...)

xxd -r -p <data26.6.2015.txt >/tmp/f1
set -- $(grep -b -a -o -P '\xfa\xfa\xfa\xfa' /tmp/f1 | sed 's/:.*//')

You now have a list of offsets in $1, $2, ... You can then extract the part that interests you with dd, setting a block size to 1 (bs=1) so that it reads byte by byte. skip= says how many bytes to skip in the input, and count= the number of bytes to copy.

start=$1 end=$2
let count=$end-$start
dd bs=1 count=$count skip=$start </tmp/f1 >/tmp/f2

The above extracts from the start of the 1st pattern to just before the 2nd pattern. To not include the pattern, you can add 4 to start (and count reduces by 4).

If you want to extract all parts, use a loop around this same code, and add starting offset 0 and ending offset size-of-file to the list of numbers:

xxd -r -p <data26.6.2015.txt >/tmp/f1
size=$(stat -c '%s' /tmp/f1)
set -- 0 $(grep -b -a -o -P '\xfa\xfa\xfa\xfa' /tmp/f1 | sed 's/:.*//') $size
i=2
while [ $# -ge 2 ]
do start=$1 end=$2
   let count=$end-$start
   dd bs=1 count=$count skip=$start </tmp/f1 >/tmp/f$i
   let i=i+1
   shift
done

If grep doesnt manage to work with the binary data, you can use the xxd hex dump data. First remove all the newlines to have one enormous line, then do the grep using the unescaped hex values, but then divide all the offsets by 2, and do the dd with the raw file:

xxd -r -p <data26.6.2015.txt >r328.raw
tr -d '\n' <data26.6.2015.txt >f1
let size2=2*$(stat -c '%s' f1)
set -- 0 $(grep -b -a -o -P 'fafafafa' f1 | sed 's/:.*//') $size2
i=2
while [ $# -ge 2 ]
do  let start=$1/2
    let end=$2/2
    let count=$end-$start
    dd bs=1 count=$count skip=$start <r328.raw  >f$i
    let i=i+1
    shift
done

Pseudocode of algorithm

Output from grep (ggrep is gnu grep) on the hex data

Documentation

Best Answer

Related Solutions

Creating a single output stream out of three other streams produced in parallel

Grep –byte-offset doesn’t give me the bytes offset

Related Question