I am ambitiously trying to translate a c++ code into bash for a myriad of reasons.
This code reads and manipulates a file type specific to my sub-field that is written and structured completely in binary. My first binary-related task is to copy the first 988 bytes of the header, exactly as-is, and put them into an output file that I can continue writing to as I generate the rest of the information.
I am pretty sure that my current solution isn't working, and realistically I haven't figured out a good way to determine this. So even if it is actually written correctly, I need to know how I would test this to be sure!
This is what I'm doing right now:
hdr_988=`head -c 988 ${inputFile}`
echo -n "${hdr_988}" > ${output_hdr}
headInput=`head -c 988 ${inputTrack} | hexdump`
headOutput=`head -c 988 ${output_hdr} | hexdump`
if [ "${headInput}" != "${headOutput}" ]; then echo "output header was not written properly. exiting. please troubleshoot."; exit 1; fi
If I use hexdump/xxd to check out this part of the file, although I can't exactly read most of it, something seems wrong. And the code I have written in for comparison only tells me if two strings are identical, not if they are copied the way I want them to be.
Is there a better way to do this in bash? Can I simply copy/read binary bytes in native-binary, to copy to a file verbatim? (and ideally to store as variables as well).
Best Answer
Dealing with binary data at a low level in shell scripts is generally a bad idea.
bash
variables can't contain the byte 0.zsh
is the only shell that can store that byte in its variables.In any case, command arguments and environment variables cannot contain those bytes as they are NUL delimited strings passed to the
execve
system call.Also note that:
or its modern form:
strips all the trailing newline characters from the output of
cmd
. So, if that binary output ends in 0xa bytes, it will be mangled when stored in$var
.Here, you'd need to store the data encoded, for instance with
xxd -p
.You could define helper functions like:
xxd -p
output is not space efficient as it encodes 1 byte in 2 bytes, but it makes it easier to do manipulations with it (concatenating, extracting parts).base64
is one that encodes 3 bytes in 4, but is not as easy to work with.The
ksh93
shell has a builtin encoding format (usesbase64
) which you can use with itsread
andprintf
/print
utilities:Now, if there's no transit via shell or env variables, or command arguments, you should be OK as long as the utilities you use can handle any byte value. But note that for text utilities, most non-GNU implementations can't handle NUL bytes, and you'll want to fix the locale to C to avoid problems with multi-byte characters. The last character not being a newline character can also cause problems as well as very long lines (sequences of bytes in between two 0xa bytes that are longer that
LINE_MAX
).head -c
where it's available should be OK here, as it's meant to work with bytes, and has no reason to treat the data as text. Soshould be OK. In practice at least the GNU, FreeBSD and ksh93 builtin implementations are OK. POSIX doesn't specify the
-c
option, but sayshead
should support lines of any length (not limited toLINE_MAX
)With
zsh
:Or:
Even in
zsh
, if$var
contains NUL bytes, you can pass it as argument tozsh
builtins (likeprint
above) or functions, but not as arguments to executables, as arguments passed to executables are NUL delimited strings, that's a kernel limitation, independent of the shell.