Mac – APFS: fsroot tree is invalid after Time Machine backup – how to recover and avoid in the future

apfsdisk-utilityfsckhigh sierratime-machine

System

MacBook Pro, late 2013, 1 TB SSD (brand new, recently replaced by Apple), APFS (no journaling, case insensitive), High Sierra 10.13.2, Time Machine to network HDD.

What Happened

  • Mac stopped working, no space left on device.
  • Reboot failed.
  • Tried to boot into recovery mode with Command-R and run First Aid from Disk Utility – failed, because apparently the recovery system also resides on the same disk which seems to makes fsck on APFS impossible.
  • Tried to manually delete some files via rm, got no space left on device
  • Tried to truncate some files manually via cat /dev/null > somefile, got no space left on device
  • Booted into recovery mode with Shift-Command-R (downloads the system from Internet) and ran First Aid again. This time with limited success:

    ** Checking volume.
    ** Checking the container superblock.
    ** Checking the EFI jumpstart record.
    ** Checking the space manager.
    ** Checking the object map.
    ** Checking the APFS volume superblock.
    ** Checking the object map.
    error: invalid dstream.size (10730881024), is greater than dstream.alloced_size (71151616)
    error: xf : INO_EXT_TYPE_DSTREAM : invalid dstream
    error: inode_val: object (oid 0x16309a1): invalid xfields
    ** Checking the fsroot tree.
       fsroot tree is invalid.
    ** The volume /dev/rdisk2s1 could not be verified completely.
    

So apparently the fsroot tree is invalid. I've searched, but wasn't able to find any usable advice on how to fix this (except of course, reformat and restore from backup, which I'd like to avoid).

Additional Background Info

On the system is a Parallels Windows VM with a virtual 100 GB harddrive (yes, one big file), which was recently used (so a backup was needed). The last time I have used the computer, roughly 20 GB were still free on the macOS SSD. For a day or so, Time Machine backups have not completed, but no error message was shown. When the problem happened, I had left the machine turned on over night to complete an incremental Time Machine backup. The connection here is, that Time Machine is apparently using APFS snapshots. I suspect this is the root cause of why this mess happened.

Questions

  1. Is there a way to fix this (without reformat and restore from backup)?
  2. What's the best way to avoid this in the future (especially with regard to Time Machine)?

Thanks.

Update

When running fsck_apfs with the debug flag -d, the output contains a bit more information:

** Checking volume.
** Checking the container superblock.
** Checking the EFI jumpstart record.
** Checking the space manager.
** Checking the object map.
** Checking the APFS volume superblock.
** Checking the object map.
error: invalid dstream.size (10730881024), is greater than dstream.alloced_size (71151616)
error: xf : INO_EXT_TYPE_DSTREAM : invalid dstream
error: inode_val: object (oid 0x16309a1): invalid xfields
obj-id: 23267745 type: Inode      
private-id: 23267745 parent-id: 12896552 cr/mtime: 1515089959653928186/1515090145416398252 
def-prot-class: 0 
uid/gid/mode: 0/0/0x8180 bsd_flags: 0x0 internal_flags: 0x8280 name: NO-NAME
** Checking the fsroot tree.
   fsroot tree is invalid.
** The volume /dev/disk2 could not be verified completely.

Best Answer

I just ran into similar issue. Likely you would have found that the problem was in one of the files for the Parallels VM - at least that was the culprit in my case. Your fsck_apfs -d /disk/<disk> check returned:

obj-id: 23267745 type: Inode

If you had opened terminal you could have gotten the path to the file (or files) using that inode using the following command:

find / -inum 23267745

From there you would have known which file(s) needed to be restored instead of doing a full restore.

In my case the VM file was only available in the snapshot as I exclude my VMs from TimeMachine. I restored just that file from an earlier snapshot and I got further through fsck_apfs - it got through the disk to checking snapshots and then bombed on same file in the 2nd snapshot. Luckily snapshots are only kept for at most 24 hours so it should clear up after that point.

Your mileage may vary however as it could be as "simple" as one file or just the tip of the iceberg.