Binary Files – Understanding the Mystery of Binary Files

binarycompiling

This is about files straight from the compiler, say g++, and the -o (outfile) flag.

If they are binary, shouldn't they just be a bunch of 0's and 1's?

When you cat them, you get unintelligible output but also intact words.

If you file them, you get the answer immediately – there seem to be no computation. Do the binary files in fact have headers with this kind of information?

I thought a binary executable was just the program just compiled, only in the form of machine instructions that your CPU can instantly and unambiguously understand. If so, isn't that instruction set just bit patterns? But then, what's all the other stuff in the binaries? How do you display the bits?

Also, if you somehow get hold of the manual of your processor, could you write a binary manually, one machine instruction at a time? That would be terribly ineffective, but very fascinating if you got it to work even for a "Hello World!" demo.

Best Answer

This Super User question: Why don't you see binary code when you open a binary file with text editor? addresses your first point quite well.

Binary and text data aren't separated: They are simply data. It depends on the interpretation that makes them one or the other. If you open binary data (such as an image file) in a text editor, much of it won't make sense, because it does not fit your chosen interpretation (as text).

Files are stored as zeros and ones (e.g. voltage/no voltage on memory, magnetization/no magnetization on hard drive). You don't see zeros and ones when cat ing the files because the 0/1 sequences won't be of much use to an human; characters make more sense, and an hexdump is better for most purposes (try hexdump on a file).

Executable files do have a header that describes parameters such as the architecture for which the program was built, and what sections of the file are code and data. This is what file uses to identify the characteristics of your binary file.

Finally: yes, you can write programs in assembly language using CPU opcodes directly. Take a look at Introduction to UNIX assembly programming and the Intel x86 documentation for a starting point.

Related Question