Well, I'm close and think I've found a path to recovery. Since I haven't received advice from others, I'll post what I've learned so far.
Summary:
- There is an unmaintained, not-officially-supported
labelfix
utility for fixing labels on certain kinds of corrupted (and offline/detached) ZFS volumes, which can be used to make an unimportable pool importable.
- Before doing anything, be sure to clone the old spare devices and only work on the clones.
- If you have a situation as described in the question with two pools of the same name (due to a mistaken
create
or other error), make sure only devices are plugged in with the specific pool you want to recover.
- Also, remove any devices that may have ever been associated with the pool you want to recover but which are faulted. (That applies even if you think you've completely destroyed any other pools and disassociated these devices. The recovery tools will try to piece together fragments of old pools, and may read old labels/uberblocks to combine devices and data in unpredictable ways.)
More details:
There appears to be a way to recover offline and detached drives from zpools on Linux. A user jjwhitney created a port of the labelfix utility I mentioned in the question, originally written by Jeff Bonwick (the inventor of ZFS) nearly 12 years ago. For reasons I cannot fathom, this utility has not been incorporated into ZFS builds, even though it would allow recovery of data for intact pools when an import fails for a number of reasons due to invalid labels. (Some discussion on the issue here.)
(Sidenote: One thing this process has led me to realize is that ZFS recovery tools are severely lacking, and no one should be using this filesystem for anything without complete backups of data running all the time. And don't depend on that old mirrored drive sitting in your closet to be a last-chance backup unless you're sure it's importable. ZFS is apparently great at maintaining data integrity when ZFS cooperates, but incredibly fragile. And when it breaks -- or you do something minor but stupid -- your data can simply all be inaccessible and unreadable, even if intact.)
In any case, the labelfix utility hasn't been updated in 5 years, so it doesn't compile with modern ZFS library files. Luckily, I had the original OS version still installed and could boot to that, then download an old ZFS on Linux source tarball and use that to get the appropriate ZFS libraries and build environment on a system where it all still works. (I started tweaking the labelfix utility to try to work with modern ZFS libraries, but that seemed a little dangerous given how little I understand about all the internals I'd need to fix to correspond to the current codebase. Easier to just build it on an old version.)
And lo, labelfix
immediately and easily rewrote the label on my device to something that zpool import
could at least interpret!
I should say that I used ddrescue
to copy the whole thing from the original drive before attempting any of this. And I would highly recommend that, as it's possible to make mistakes, as I did. The original pool I accidentally wrote over was named backup
, so zdb
started seeing multiple versions of the different backup
pools and couldn't figure out why all the metadata didn't match. I had to tweak vdev_validate_skip=1
in the ZFS kernel module to get the pool to import, but that then just imported the newer backup
pool (not the one I wanted). Note that this happened even though I specified the exact path to the drive I wanted to import
from: when forcing imports with this method, it seemed to completely ignore my specification and draw on a completely different configuration from a device that wasn't listed in the command.
Luckily, I had made another clone of the drive, so I could attempt another run. However, labelfix
also is smart and seems to read the current drive configuration, so it picked up on the fact that I had two old drives with "corrupted data" from the first backup
pool. The corruption unfortunately meant that the "fixed" label listed the pool not only as DEGRADED
but also FAULTED
and thus un-import
-able.
At this point I realized I simply had to unplug all the old drives and work without them in the system at all to avoid corrupting recovery attempts.
Unfortunately, labelfix
only seems to fix things once, so I am now on to clone #3 of this drive (which is currently copying from my first backup clone). Once that cloning process finishes, I'll run labelfix
without any of the other old drives present, and hopefully I'll get a DEGRADED
pool that I can then import
.
Best Answer
Apparently there is incompatibility between Solaris 11 and Solaris 10 - ZFS Encryption and ZFS Deduplication. Although they are supported from ZFS version 31 and 21 respectively on Solaris 11 they aren't supported on Solaris 10 at all even though it supports up to version 32 :(
Hint:
zpool upgrade -v
for a list of supported features.