How did file identify this particular file

file-command

I'm running file against a wallet.dat file (A file that Bitcoin keeps its private keys in) and even though there doesn't seem to be any identifiable header or string, file can still tell that it's a Berkley DB file, even if I cut it down to 16 bytes.

I know that file was applying some sort of rule or searching for some sequence to identify it. I want to know what the rule it's applying here is, so that I can duplicate it in my own program.

Best Answer

Grab the source of the file command. Most if not all open sources unices use this one. The file command comes with the magic database, named after the magic numbers that it describes. (This database is also installed on your live system, but in a compiled form.) Look for the file that contains the description text that you see:

grep 'Berkeley DB' magic/Magdir/*

The magic man page describes the format of the file. The trigger lines for “Berkeley DB” are

0       long    0x00061561      Berkeley DB
0       belong  0x00061561      Berkeley DB
12      long    0x00061561      Berkeley DB
12      belong  0x00061561      Berkeley DB
12      lelong  0x00061561      Berkeley DB
12      long    0x00053162      Berkeley DB
12      belong  0x00053162      Berkeley DB
12      lelong  0x00053162      Berkeley DB
12      long    0x00042253      Berkeley DB
12      belong  0x00042253      Berkeley DB
12      lelong  0x00042253      Berkeley DB
12      long    0x00040988      Berkeley DB
12      belong  0x00040988      Berkeley DB 
12      lelong  0x00040988      Berkeley DB

The first column specifies the offset at which a certain byte sequence is to be found. The third column contains the byte sequence. The second column describes the type of byte sequence: long means 4 bytes in the platform's endianness; lelong and belong mean 4 bytes in little-endian and big-endian order respectively.

Rather than replicate the rules, you may want to call the file utility; it's specified by POSIX, but the formats that it recognizes and the descriptions that it outputs aren't. Alternatively, you can link to libmagic and call the magic_file or magic_buffer function.

Related Question