Btrfs – Understanding Snapshots and Space Usage

btrfsfilesystems

Just starting to learn about btrfs and considering switching.

My current thought of btrfs is that operates pretty much like git, with everything tracked, and a commit happening every 30 seconds after changes. But, my gut is telling me I must be misunderstanding, or hard drive space would get used up much faster — so I'm wondering if it's more like git, with everything tracked, files added to the staging area every 30 second after changes, and files only being committed on snapshots.

  1. If I don't do snapshots, can you roll back a single file to several changes ago? Or is that only kept if you do snapshots? i.e. If you run a for loop 10,000 times appending to a file with 31 second sleeps inbetween, are you going to see an ancestry tree for that file of 10,000 entries, and can you go back to each of those?

  2. Can btrfs snapshots of root be used and thought of just like VMware/VirtualBox snapshots? Where you can shutdown in one, save its state, move to another, boot, make changes so you have a diverging snapshot branch, and move wherever along the tree you want? If so, is there a bootloader that lets you pick a snapshot tree node? (Without making grub.cfg menu entries for each snapshot.)

  3. I label snapshot A, make changes and label it B. If I go back to snapshot A and make changes (even just by booting changing /var/log), are those changes made in a "detached" or "unlabeled" snapshot, so those changes would be invisible if going back to B? If so, what happens if I have changes in this "unlabeled" snapshot, and accidentally ask to change to another before labeling it?

  4. When deleting a file, is there "this file is deleted" metadata written, so space is still taken by all the versions of the file? Or, does it delete all previous versions, assuming there's no snapshot still pointing to it?

  5. If I build gcc from source, as an example, I think the build directory winds up being 5-8GB. If I build it periodically from source, I'm "chewing up" a bunch of hard drive space, right? (Even assuming delete removes everything for the file being deleted, I don't know how many files are deleted in the build process without a make clean — whether existing object files are technically deleted or just "re-written inside" of them.)

Best Answer

I think that most of your questions can be answered simply by remembering that in Btrfs, a snapshot is not really special, it's just a Btrfs subvolume. It just happens that when it's created, it has initial contents instead of being empty, and the storage space for those initial contents is shared with whatever subvolume the snapshot came from.

A snapshot is just like a (full) copy, except it's more economical because of the shared storage.

  1. If I don't do snapshots, can you roll back a single file to several changes ago?

No. Just like with any regular filesystem, modifying files is destructive. You can't magically go back to an earlier version.

  1. Can btrfs snapshots of root be used and thought of just like VMware/VirtualBox snapshots?

VM disk images are usually block devices, not filesystems or files on filesystems, so I think it's a little different.

You could use a Btrfs file as backing store for a VM virtual block device, I guess. In which case the answer to that question is yes. Except if you use the NOCOW option (which is actually recommended for disk images). Then probably not, because copy-on-write is the magic that makes snapshots work.

  1. I label snapshot A, make changes and label it B. If I go back to snapshot A and make changes (even just by booting changing /var/log), are those changes made in a "detached" or "unlabeled" snapshot, so those changes would be invisible if going back to B?

Every subvolume (including snapshots) in Btrfs has a name, so you cannot have an "unlabeled" snapshot.

In general, any changes you make in one Btrfs subvolume (whether it was created as a snapshot or not) are absolutely not ever visible in another Btrfs subvolume. Just remember that a snapshot is just like a copy, but more economical.

  1. When deleting a file, is there "this file is deleted" metadata written, so space is still taken by all the versions of the file?

When deleting a file, its directory entry is removed. That is a modification to the directory, and like all modifications, it will be private to the subvolume in which it occurred. Then after that, if and only if the storage space for the file is not used by any other part of the filesystem, it's freed.

Deleting a file whose storage is shared between multiple snapshots is a lot like deleting a file in any regular filesystem when it has multiple (hard) links. The storage [inode] is freed iff it is not referenced anymore.

  1. If I build gcc from source, as an example, I think the build directory winds up being 5-8GB. If I build it periodically from source, I'm "chewing up" a bunch of hard drive space, right?

If you build gcc multiple times in multiple different directories, then yeah, it will use more and more space. If you delete copies in between builds or overwrite the same build directory each time, then, no, there's no particular reason why it would keep using more and more space.

Related Question