APFS Snapshot Invalid – How to Fix

apfsdisk-utilitymacostime-machine

Running fsck_apfs reports an error checking my main disk:

root@bix ~ # fsck_apfs -n -l /dev/disk1s1
** Checking the container superblock.
** Checking the EFI jumpstart record.
** Checking the space manager.
** Checking the space manager free queue trees.
** Checking the object map.
** Checking volume.
** Checking the APFS volume superblock.
** The volume Macintosh HD - Data was formatted by hfs_convert (748.1.46) and last modified by apfs_kext (1412.81.1).
** Checking the object map.
** Checking the snapshot metadata tree.
** Checking the snapshot metadata.
** Checking snapshot 1 of 2 (com.apple.apfs.purgatory.84779e)
error: sibling_map_val object (oid 0x2c6200000000168): invalid length (20)
   Snapshot is invalid.
** The volume /dev/disk1s1 could not be verified completely.

The output is the same in both Recovery Mode and Safe Mode (without the disk being mounted).

Searching online for "sibling_map_val object" only yields some Hopper disassembler code, which makes me think that this is an unusual error to have. However, lots of people mention the "Snapshot is invalid" output.

Is there any way to force-delete the snapshot? The system doesn't report that any exist:

root@bix ~ # tmutil listlocalsnapshots /
Snapshots for volume group containing disk /:

The root problem is that my main disk keeps filling up, despite my deleting hundreds of GBs of files. Daisy Disk reports that it is inaccessible hidden system files (even despite upgrading permissions via the non-App-Store version of the utility). I suspect that there is some corrupt local snapshot in the "com.apple.apfs.purgatory.84779e" mentioned in the fsck output, but I don't see any way to trash it.

I've tried disabling/enabling Time Machine backups and Spotlight indexing to no avail. Rebooting always winds up recovering about 5GB, but then it quickly drops down to about 2GB of free space and hovers around there until the system starts complaining that there isn't enough system memory for my applications to remain open. And since I can't free up sufficient space, Time Machine complains that it doesn't have enough space to create a local snapshot, and so my disk isn't being backed up (and so I can't just wipe out the whole file system and restore from Time Machine). I'm stuck with a disk that keeps on filling itself up.

Best Answer

error: sibling_map_val

Let's break this down. First according to APFS spec (PDF):

Hard links that all refer to the same inode are called siblings. Each sibling has its own identifier thatʼs used instead of the shared inode number when siblings need to be distinguished. ... You use sibling links and sibling maps to convert between sibling identifiers and inode numbers. Sibling-link records let you find all the hard links whose target is a given inode. Sibling-map records let you find the target inode of a given hard link.

So the sibling_map is just like a spreadsheet with a couple of columns in it, a key that refers to an actual file on the file system, and a value that has the Object ID of a hard linked "file". In this case, the value for your ID is not the correct length, indicating it is corrupted.

Further, that corrupted data appears to be in an incomplete snapshot, so the solution is to delete that snapshot, which can be quite difficult.

Possible Solutions (Least to Most Destructive)

Delete Local Snapshot

Yes, you mentioned this, but it's an important first step. First, make sure you turn off TimeMachine.

Is there any way to force-delete the snapshot?

Yes, and you may as well make a script of it because it's a frequent problem. 99% of the time is the oldest snapshot or the one that says dateless

tmutil listlocalsnapshots / ... Output of that command looks like this:

Snapshots for volume group containing disk /:
com.apple.TimeMachine.2020-04-01-122516.local
com.apple.TimeMachine.2020-04-01-132348.local
com.apple.TimeMachine.2020-04-01-143800.local
com.apple.TimeMachine.2020-04-01-153811.local
com.apple.TimeMachine.2020-04-01-183757.local
com.apple.TimeMachine.2020-04-01-193758.local
com.apple.TimeMachine.2020-04-01-203828.local

You just need to copy the timestamp for each line you want to kill and paste it into the next command. Again, usually deleting just the oldest (top) one will resolve related issues.

sudo tmutil deletelocalsnapshots 2020-04-01-090758

If successful, you will get no response (exit 0) in terminal.

2. Delete the Offending Snapshot

WARNING: You should not proceed if you don't have a full backup of your drive. You could lose some data. You could lose all your data.

Boot into single user mode (reboot into recovery mode and enter commands in the terminal as root user) and try to find the location of the snapshot. Something like:

find / com.apple.apfs.purgatory.84779e # Totally untested

Once you find it, rm that file. If you are unable to locate the file, on to step 3.

3. Do a Safe Reinstall

While still in recovery mode, exit terminal and perform a reinstall of the operating system. This is "safe" in that only the system files are recreated. Your $HOME folder will be left in place, so if all goes smoothly, you shouldn't have to recover your hard drive from a backup.

Once finished, run fsck_apfs again to verify the issue is resolved. If not...

4. Do a Full Reinstall

Still in recovery mode, open Disk Utility and delete the partition on which the OS is installed. Doing this will delete all of your information. Recreate the partition (consider using HFS+ if you have frequent issues with APFS) just as it was before. Exit Disk Utility back into recovery mode.

Before doing a reinstall, use fsck_apfs on the new partition to verify that it doesn't come back with any errors containing the word physical. Any errors at this point likely indicate an issue with the hard drive itself, and it may need to be replaced. Examples of such errors include:

  • Unable to mark physical extent range
  • found physical extent corruption

Try repairing, of course, but if you aren't successful, consider replacing your drive.

Then proceed with an install just as you did in step 3, followed by running a recovery from your most recent backup.

Good luck.