“write-once archive”: ext2 vs ext4^has_journal vs

backupext2ext4filesystemsjournaling

summary

Suppose one is setting up an external drive to be a "write-once archive": one intends to reformat it, copy some files that will (hopefully) never be updated, then set it aside until I need to read something (which could be a long while or never) from the archive from another linux box. I also want to be able to get as much filespace as possible onto the archive; i.e., I want the filesystem to consume as little freespace as possible for its own purposes.

specific question 1: which filesystem would be better for this usecase: ext2, or ext4 without journaling?

Since I've never done the latter before (I usually do this sort of thing with GParted), just to be sure:

specific question 2: is "the way" to install journal-less ext4 mke2fs -t ext4 -O ^has_journal /dev/whatever ?

general question 3: is there a better filesystem for this usecase? or Something Completely Different?

details

I've got a buncha files from old projects on dead boxes (which will therefore never be updated) saved on various external drives. Collectively size(files) ~= 250 GB. That's too big for DVDs (i.e., would require too many–unless I'm missing something), and I don't have a tape drive. Hence I'm setting up an old USB2 HFS external drive to be their archive. I'd prefer to use a "real Linux" filesystem, but would also prefer a filesystem that

  1. consumes minimum space on the archive drive (since it's just about barely big enough to hold what I want to put on it.
  2. will be readable from whatever (presumably Linux) box I'll be using in future.

I had planned to do the following sequence with GParted: [delete old partitions, create single new partition, create ext2 filesystem, relabel]. However, I read here that

recent Linux kernels support a journal-less mode of ext4
which provides benefits not found with ext2

and noted the following text in man mkfs.ext4

"mke2fs -t ext3 -O ^has_journal /dev/hdXX"
will create a filesystem that does not have a journal

So I'd like to know

  1. Which filesystem would be better for this usecase: ext2, or ext4 without journaling?
  2. Presuming I go ext4-minus-journal, is the commandline to install it mke2fs -t ext4 -O ^has_journal /dev/whatever ?
  3. Is there another, even-better filesystem for this usecase?

Best Answer

I don't agree with the squashfs recommendations. You don't usually write a squashfs to a raw block device; think of it as an easily-readable tar archive. That means you would still need an underlaying filesystem.

ext2 has several severe limitations that limit its usefulness today; I would therefore recommend ext4. Since this is meant for archiving, you would create compressed archives to go on it; that means you would have a small number of fairly large files that rarely change. You can optimize for that:

  • specify -I 128 to reduce the size of individual inodes, which reduces the size of the inode table.
  • You can play with the -i option too, to reduce the size of the inode table even further. If you increase this value, there will be less inodes created, and therefore the inode table will also be smaller. However, that would mean the filesystem wastes more space on average per file. This is therefore a bit of a trade-off.
  • You can indeed switch off the journal with -O ^has_journal. If you go down that route, though, I recommend that you set default options to mount the filesystem read-only; you can do this in fstab, or you could use tune2fs -E mount_opts=ro to record a default in the filesystem (you cannot do this at mkfs time)
  • you should of course compress your data into archive files, so that the inode wastage isn't as bad a problem as it could be. You could create squashfs images, but xz compresses better, so I would recommend tar.xz files instead.
  • You could also reduce the number of reserved blocks with the -m option to either mkfs or tune2fs. This sets the percentage (set to 5 by default) which is reserved for root only. Don't set it to zero; the filesystem requires some space for efficient operation.
Related Question