Making bit identical ext2 filesystems

ext2filesystemsmkfsreproducible-build

I'm preparing an image file for a linux system. I need to be able to run my script that creates the image and have the output be bit-for-bit identical each time.

I do the normal procedure, by making a large binary file, partition it, create a loop device with the partition and then make the I filesystem. I then mount the file system, copy the syslinux and initrd stuff over, unmount the partition, delete the loop devices and I have my image file. I can dd it to a disk and the linux system boots correctly. So I'm making the filesystem correctly.

I run my script that performs the above steps but each time the output differs. Some of it is timestamps in the ext2 data structures. I wrote a program that reads in the ext2 structures and can clear out the timestamps, and tune2fs can clear out a few more things but some of the bitmap data even differs and it seems the file data isn't even in the same place each time.

So how would I go about creating identical filesystems?

Here's the commands I use to create a filesystem, put a file on it and unmount it. Save the output and run it again, then compare the outputs, the file a.txt gets put in different locations.

dd if=/dev/zero bs=1024 count=46112 of=cf.bin
parted cf.bin <<EOF
unit
s
mklabel
msdos
mkpart
p
ext2
63s
45119s
set
1
boot
on
q
EOF

losetup -o $(expr 63 \* 512) /dev/loop0 cf.bin

mke2fs -b 1024 -t ext2 /dev/loop0 22528

#clear some parameters
tune2fs -i 0 /dev/loop0 # interval between check
tune2fs -L LABEL /dev/loop0
tune2fs -U 00000000-0000-0000-0000-000000000000 /dev/loop0 #uuid
tune2fs -c 0 /dev/loop0 #mount count

mount /dev/loop0 mnt
# make a dummy file
echo HELLO > mnt/a.txt
umount mnt

losetup -d /dev/loop0

Update
If I put the above commands in a script, copy and paste them to run a second time (but save the output between), and even change the date before running the commands a 2nd time (using the date command), the a.txt gets put in the same disk location. But if you run the script, save the output, and run it again from the command line, compare the outputs and a.txt is in different locations. Very curious behavior. What data is being used to generate the file locations? Clearly it's not the time. The only thing I can think of is the difference between calling the commands twice via calling the script twice vs running the commands twice in the same script would be something like the process ID of the calling process. Ideas anyone?

Update #2
I gave up on trying to use ext2. So I can't answer my original question about ext2, but I'll describe what I did to get a completely reproducible build of a basic linux system.

Instead of ext2, use a FAT variant or ISO9660. If you need a partition less than 32MB, use FAT16 for the linux system partition, otherwise use FAT32. Either FAT16 or FAT32 will repeatedly put files in the same locations. But it does have some time stamps in its directory entries.
Add linux system files needed to boot.
Write a program to walk the FAT16/32 filesystem directory structures and set all time stamps to 0.
Clear the disk signature in the mbr. Either do this in your program that clears timestamps, or use dd.
Since it's a FAT filesystem, I'm using syslinux for a boot loader. cpio will produce identical initrd's from run to run, so there's no issues there. This is all that is needed for a basic bit-for-bit identical linux system.

Issues with FAT file systems

For just booting a linux system, FAT shouldn't cause any problems. But for larger data partitions, there are a couple issues with FAT32 that may crop up.

It is possible to bump into the maximum number of files in a directory. This isn't likely to be a problem. (but of course, in my case it was)
FAT32 will store an 8.3 filename for each file. Long file names are shortened to a stem with a tilde and a number appended. But if you have more than 9 files that map to the same short stem, FAT32 uses an undocumented procedure to generate a sort of hash to append to the file name instead. I dug into the linux kernel code for FAT32, and it uses the time as a hash seed (the functionvfat_create_shortname() in file namei_vfat.c). So this field is not reproducible. I don't know how Microsoft's implementation does it. You may get away with just clearing this field, as I don't think the 8.3 names are used for anything other than DOS. Or you could generate your own unique numbers that you can reproduce, it doesn't matter what the numbers are, just that they're unique.

Using ISO9660 for an additional partition

Use genisoimage to create the iso. It will generate identical output from run to run with the exception of time stamps. Using the -l option lets you have file names of up to 31 character. If you need filenames longer than that, use the rock ridge extension. The command is
```
genisoimage -o gfx.iso -R -l -f assets/files/
```
Write a program that walks the iso9660 filesystem, clears all time stamps, including the TF field of the rock ridge entries.
Use fdisk or parted to make a partition in your disk image. 96h is the MBR id number for ISO9660.
If necessary, patch up the partition table. Parted doesn't support making a partition of type iso9660. Unfortunately, I'm stuck with an older version of both parted and fdisk, and parted is easier to use. So I used parted to make my second partition as fat32. Then used fdisk to change the type to 96.
Use dd to embed the iso in the disk image, using the same numbers you used for making the partition. I used
```
dd bs=512 seek=$part2_start_lba conv=notrunc if=gfx.iso of=cf.bin
```

where cf.bin is my disk image file.
6. Mount the iso partition after linux has booted. If the iso is the second partition, it will be /dev/sda2. You may have to use mknod to make the proper device file in /dev first.

Best Answer

IMHO this all seems to be made overly complicated. When tar alone seems like the obvious solution. tar can create just about any file system, including cdfs (--options cd9660:*). It will also allow you to time stamp the output file to any of that of the most recent -m || --modification-time, --gid id || --gname name, --acls || --no-acls, --same-owner || --no-same-owner, ...

Or you could create your filesystem. Perform a chown -Rh someone:somegroup . within your file tree, and chmod it to your liking and use either tar, or rsync to place the file tree into your prepared filesystem. Then everything would be consistent -- same date, same owner/group && perms.

Well that's the way I'd approach something like this. :)

HTH

Related Solutions

Largefile feature at creating file-system

The -T largefile flag adjusts the amount of inodes that are allocated at the creation of the file system. Once allocated, their number cannot be adjusted (at least for ext2/3, not fully sure about ext4). The default is one inode for every 16K of disk space. -T largefile makes it one inode for every megabyte.

Each file requires one inode. If you don't have any inodes left, you cannot create new files. But these statically allocated inodes take space, too. You can expect to save around 1,5 gigabytes for every 100 GB of disk by setting -T largefile, as opposed to the default. -T largefile4 (one inode per 4 MB) does not have such a dramatic effect.

If you are certain that the average size of the files stored on the device will be above 1 megabyte, then by all means, set -T largefile. I'm happily using it on my storage partitions, and think that it is not too radical of a setting.

However, if you unpack a very large source tarball of many files (think hundreds of thousands) to that partition, you have a chance of running out of inodes for that partition. There is little you can do in that situation, apart from choosing another partition to untar to.

You can check how many inodes you have available on a live filesystem with the dumpe2fs command:

# dumpe2fs /dev/hda5
[...]
Inode count:              98784
Block count:              1574362
Reserved block count:     78718
Free blocks:              395001
Free inodes:              34750

Here, I can still create 34 thousand files.

Here's what I got after doing mkfs.ext3 -T largefile -m 0 on a 100-GB partition:

Filesystem           1M-blocks      Used Available Use% Mounted on
/dev/loop1              102369       188    102181   1% /mnt/largefile
/dev/loop2              100794       188    100606   1% /mnt/normal

The largefile version has 102 400 inodes while the normal one created 6 553 600 inodes, and saved 1,5 GB in the process.

If you have a good clue on what size files you are going to put on the file system, you can fine-tune the amount of inodes directly with the -i switch. It sets the bytes per inode ratio. You would gain 75% of the space savings if you used -i 65536 while still being able to create over a million files. I generally calculate to keep at least 100 000 inodes spare.

Ubuntu – What doesn’t need defragmentation? Linux or the ext2 ext3 FS

Here's an article on How To Geek about how ext2/ext3 allocates files on the disk. And they also have an article asking "Do you really need to defrag?"

On why FAT becomes fragmented:

"When you save a file to a FAT file system, [the file is saved] as close to the start of the disk as possible. When you save a second file, [the file is saved] right after the first file – and so on. When the original files grow in size, they will always become fragmented. There’s no nearby room for them to grow into."
-How To Geek

And wikipedia has more information about FAT fragmentation.

On how EXT2,3,4 allocate files:

"ext2, ext3, and ext4 file systems [...] allocates files in a more intelligent way. Instead of placing multiple files near each other on the hard disk, Linux file systems scatter different files all over the disk, leaving a large amount of free space between them."
-How To Geek

(And more info on defragmentation on ext3, from wikipedia)

"Modern Linux filesystem(s) keep fragmentation at a minimum by keeping all blocks in a file close together, even if they can't be stored in consecutive sectors. Some filesystems, like ext3, effectively allocate the free block that is nearest to other blocks in a file. Therefore it is not necessary to worry about fragmentation in a Linux system."
-TLDP

Issues with FAT file systems

Best Answer

Related Solutions

Largefile feature at creating file-system

Ubuntu – What doesn’t need defragmentation? Linux or the ext2 ext3 FS

Related Question