Advanced NTFS partition file recovery techniques for damaged drives (IO errors)

data-recoveryntfs

I've recently suffered a maddeningly small but quite important amount of damage to a hard drive on a ESXi host affecting a couple VMs. There's a file that I would very much like to recover, and of course it was somehow left off of my regular backup. The most recent copies are 6 months old. Turns out I need that… oops.

Details:

1) I have used ddrescue (AWESOME tool) within a Parted Magic bootable ISO to recover 99.98% of the VM's drive in question. Unfortunately, the errors appear to be almost entirely of RECENT file writes… so of course they're exactly the sectors I need to recover most.

2) The drive gives IO errors on bad sector reads, but it occasionally SUCCEEDS in reading a previously bad sector! So, recovery is still possible. Slightly more often than that will have some kind of major malfunction and spin the drive down and back up. Oh, and about 1/4 of those spin downs won't come back up. (Hard power cycle required, shutdown won't function) Last, just about every bad sector read comes with a nice audible clicking sound.

3) The important VM disk is NTFS formatted.

4) I can (usually) mount the damaged NTFS volume read-only, and I can (slightly less often) navigate to the folder that contains the file I need. However, the file in question appears to always give an IO error when I do an 'ls' of the folder. The other files in the folder do not give an IO error.

5) I've tried using ntfsinfo/etc… which sounds like exactly what I need… but it won't open the partition at all. (Frustrating, since 'mount' usually will)

6) The file is a Excel 2003-era XLS file, so I'm not sure I can come up with any strings to search the raw disk image for. (Possibly parts of the 6 month old version?)

I'd really like to use something like the facilities of debugfs. However, from the man pages it appears the ntfs tools could do the work if only they could be made to open the partition. In particular, I am wondering if the IO errors might be purely within the metadata for the file, and if the directory record could be restored well enough to copy the file contents off. As a last resort, whatever partial file contents I can retrieve would be great.

I've written (relatively simple) kernel modules before, so I could compile a special NTFS module with more debug info enabled (or added). (The file is worth at least a few days of tinkering to try to recover… plus I'm learning cool stuff in the process)

Any pointers?

EDIT:

More drive error information:

The /var/log/messages is showing a lot of NTFS-fs errors of course… but I finally bothered to translate the unhandled sense code message I usually get: sense key 0x3, ASC=0x11, ASCQ=0x4. (which appears to translate to UNRECOVERED READ ERROR – AUTO REALLOCATE FAILED).

When the drive spins down, I see a "scsi0: * BusLogic BT-958 Initialized" message. I'm not sure if it's the Linux SCSI driver, the ESXi driver, or the drive itself that decides to spin the drive down. If it was the Linux driver, then perhaps I could modify the driver to avoid spinning down. This whole ddrescue thing is made massively more painful by these power-cycle-requiring spindowns.

EDIT2:

using the "end_request: I/O error, dev sda, sector 7238859" log message right after I 'ls' the directory containing the file in question, I've targetted my ddrescue operation to that sector. I currently plan to take my chances and WRITE that sector back to the live disk if this succeeds. Perhaps I can slowly rebuild my way to the file in question this way. Still, most recoverable bad sectors are recovered in under 20 retries… this one is over 150 so far… *sigh*

EDIT3:

The sector error from 'ls' on the file I need is entirely uncooperative (1000+ tries overnight and no luck). I'm hoping that's just metadata when you do an 'ls' ? 🙂

I do have most of a ddrescue copy, but that doesn't mount (or mounts without files). The damaged drive mounts correctly most of the time… maybe IO errors on the damaged drive force 'mount' to fall back to the mirror that works?

** EDIT4:**

I've given up for now, pending further suggestions. I've removed the drive and rebuilt the box. I'll keep the drive around in case something comes up.

Best Answer

A few notes from my experience:

  1. (the cause) If you hear an unusual sound during hd access attempts, and problems don't occur at (more or less) just random disk locations, then the root cause is most probably on the disk surface (not the electronics) - unfortunately, that's the sad scenario. If it were "just" the electronics, you might have had a chance to recover most or even all of your data.
  2. (bad sectors) If you haven't already, search the web for the disk manufacturer's bootable diagnostic/recovery tool, download it, boot, run a deep test and let it try to remap bad sectors - that's the best among free methods. Note that bad sectors have a tendency to grow - so even if you manage to catch one chunk of your file after some 2314'th read attempt, chances are that those attempts just made nearby bad sectors grow, effectively decreasing the chances of recovering other parts of the file.
  3. (recovering NTFS) Nothing can fix an NTFS filesystem as well as MS Windows native tools. If the NTFS image is not mountable (also make sure that you were trying to mount the partition, not the entire disk!), you can try things like testdisk under Linux, but if those fail, Windows' chkdisk can help. If you have Windows installed under a virtual machine, you can convert the raw image obtained from ddrescue to a format supported by that virtual machine (such as VDI or VMDK), add it to the VM and boot Windows in command-line mode to fix the filesystem. If you use VirtualBox, the command to convert such image is VBoxManage convertfromraw <filename> <outputfile> optionally with --format VDI|VMDK|VHD to obtain the specified output format.
Related Question