MacOS – I’ve created multiple DMG files from the same folder – checksum is different on each

apfsdisk-utilitydmgmacos

Quite some time ago I've noticed that even if I create DMG files from the same directory, with same files and etc, the results are always different. Not only their size is ~15 bytes shorter/longer from one another, but their SHA checksums (and their contents, when being viewed from the HEX editor) differ drastically. Just out of curiosity, I've created 5 compressed unencrypted DMG files from the same folder containing nothing but a single text file. The results are:

  • 0.dmg | size – 26 204 bytes, checksum – 5ba9ba0ee4d8ec5ba4718f1b491baf31c2c4e642
  • 1.dmg | size – 26 221 bytes, checksum – a86d76f6c07ee5a81c0aefb31b6fd40ef787ebd5
  • 2.dmg | size – 26 235 bytes, checksum – a31f4cf29e4e2858b7ac63c82574499200d81108
  • 3.dmg | size – 26 209 bytes, checksum – f3c19414279b6d6b94b90341453906e4a69e28dd
  • 4.dmg | size – 26 217 bytes, checksum – 9603c0334125762fc7908343e3ee400e038fe779

I've been browsing the internet hoping to find anything about the "data randomizer in APFS", but… obviously couldn't find a single thing, and in addition, not a lot of people actually knew about this "feature". Is there any info about it?

I'm running macOS 10.12.6, the DMG files were created using Disk Utility, but I get the same results with hdiutil.

Best Answer

Copies of an existing dmg will be identical but separately created dmg files will not.

Effectively Guaranteed to Differ

The Apple Disk Image .dmg format effectively guarantees that no two disk images will be bit for bit identical. Equality between disk images containing the same contents is not a practical requirement of the format.

UUID within the 0x6B6F6C79 / koly Block

Within the dmg file format is the koly structure. This structure includes a SegmentID of type uuid_t. This is a 128 bit Universally Unique Identifier (UUID). The SegmentID identifier alone will ensure that every dmg file differs by more than one bit.

Using HFSleuth on the iTunes 11.0 disk image shows the embedded UUID:

HFSleuth> ver
Verbose output is on
HFSleuth> fs iTunes11.dmg
KOLY header found at 200363895:
    UDIF version 4, Header Size: 512
    Flags:1
    Rsrc fork: None
    Data fork: from 0, spanning 200307220 bytes
    XML plist: from 200307220, spanning 56675 bytes (to 200363895)
    Segment #: 1, Count: 1
    Segment UUID: 626f726e-7743259b-6086eb93-4b42fb65
    Running Data fork offset 0
    Sectors: 1022244

In the example above, the line Segment UUID: 626f726e-7743259b-6086eb93-4b42fb65 is a universally unique identifier embedded in the disk image.

One Bit Differences and Hash Functions

A difference in one bit should result in a 50% or more change in a cryptographic hash function output, such as SHA-2.

The use of a UUID within the structure is not to ensure every disk image is unique but to ease segment identification within the disk image. That a UUID provides uniqueness properties beyond the scope of the disk image is a by-product of the UUID's use.