Linux – How to convert two-valued text data to binary (bit-representation)

binarycommand lineddlinuxtext processing

I have a text file with two (2) only possible characters (and maybe new lines \n). Example:

ABBBAAAABBBBBABBABBBABBB

(Size 24 bytes)

How can I convert this to a binary file, meaning a bit representation, with each one of the two possible values being assigned to 0 or 1?

Resulting binary file (0=A, 1=B):

011100001111101101110111     # 24 bits - not 24 ASCII characters

Resulting file in Hex:

70FB77                       # 3 bytes - not 6 ASCII characters

I would be mostly interested in a command-line solution (maybe dd,xxd, od, tr, printf, bc). Also, regarding the inverse: how to get back the original?

Best Answer

Another perl:

perl -pe 'BEGIN { binmode \*STDOUT } chomp; tr/AB/\0\1/; $_ = pack "B*", $_'

Proof:

$ echo ABBBAAAABBBBBABBABBBABBB | \
    perl -pe 'BEGIN { binmode \*STDOUT } chomp; tr/AB/\0\1/; $_ = pack "B*", $_' | \
    od -tx1
0000000 70 fb 77
0000003

The above reads input one line at a time. It's up to you to make sure the lines are exactly what they are supposed to be.

Edit: The reverse operation:

#!/usr/bin/env perl

binmode \*STDIN;

while ( defined ( $_ = getc ) ) {
    $_ = unpack "B*";
    tr/01/AB/;
    print;
    print "\n" if ( not ++$cnt % 3 );
}
print "\n" if ( $cnt % 3 );

This reads a byte of input at a time.

Edit 2: Simpler reverse operation:

perl -pe 'BEGIN { $/ = \3; $\ = "\n"; binmode \*STDIN } $_ = unpack "B*"; tr/01/AB/'

The above reads 3 bytes at a time from STDIN (but receiving EOF in the middle of a sequence is not a fatal problem).

Related Question