Linux – Damaged HDD recovery woes

hard drivelinuxwindows

I've got a dying disk, it's an emotional time, but I've managed to either recover or retrieve backups for most of the data. I've got a few questions though.

When the disk first started dying, boot times went through the roof because it wasn't responding properly, at POST and windows splash. It would show up in disk management in Win 7 sporadically, and it seemed too much activity killed it and it disappeared again.

When it was available it would transfer it would transfer at Kb/s rather than Mb/s.

I managed to get Ubuntu 11.10 installed on my portable disk and boot into that, the dying disk was slow at first and ubuntu threw all sorts of tantrums because it wouldn't respond properly. But when it played nice it did it for a long time and speeds where sometimes in the region of 30-50Mb/s. I managed to get all my important stuff off this way (very happy) but booting back into windows and it still can't see it at all, back into ubuntu and there it is, ready to use.

Question

It's a few tiered question:

  • What could have gone wrong to the drive for the strange behaviour (no bad sectors, but still dying)
  • Does Linux really deal with [insert drive error here] that much better than windows that it was the difference between life and death of my data
  • Why does [insert drive error here] seem to deteriorate over time / useage

With all that's gone on I'm thinking more and more mechanical, but mechanical disk errors are commonplace for OSs and are (in my experience) nicely dealt with. Not this farce.

Specs:

Win 7 Pro x64 on seperate unaffected disk.
Ubuntu 11.10 x64 on portable USB disk.
Dying drive: WD 2TB Caviar Green

Thanks

[update]

Oddness continues, windows now refuses to boot with the dying drive plugged in. I can here a clunk from it every 2 seconds while windows sits at the splash screen, doesn't sound like what I've heard from a dying drive before, but that was a while ago. It sounds like it's trying to start the disk every two seconds and it's refusing. The confusing thing is windows isn't giving up, am I missing something? Shouldn't windows accept defeat and just boot without the drive, or at least show some sort of error, instead of sitting polling it every 2 seconds for 20 mins until I got bored of waiting and killed it?

[update 2]

OK I found something else a little odd last night that makes me not want to give up on this drive just yet. In the Grub prompt at boot (separate issue) even that can read files on the device with next to no lag, and when booting from an Ubuntu live CD that has seemingly good access to the device. Windows however still refuses to boot at all when this device is plugged in. I've seen a few mechanical drive errors before but never inconsistently across OSs.

Best Answer

There are many things that could be going wrong with your drive. From simple head faults to overheating motor or even overheating or dying controller chips.

To me your fault sounds a lot like the either the controller or motor is failing in some way.

Integrated circuits, such as the controller that takes magnetically read data from the head mechanism and converts it to the electrical data your computer expects, are very delicate and with age and use can wear out the same as mechanical components wear out. Either it got hot too many times, has a small imperfection that didn't affect it to begin with but time hasn't done it any good, or it's simply old.

When electronic parts get old they fail in surprisingly similar ways to mechanical parts. Sometimes they "kinda" work and you can get them to move data, but in a slow and painful way as it has a lot of error correcting logic that is having to kick in to do anything. When they're in that state extra heat can make the chip get into an illegal state and require a power cycle (just like your computer processor) or if you are lucky it may soft-boot itself without the OS knowing.

Similarly the drive motor failing may cause somewhat similar problems to what you are experiencing. The motor tries to spin up on boot up, but because the bearing or something is going then it either draws too much power causing a heavy load on the power regulators and controller and making them overheat and effectively "reboot" again or it simply takes too long to spin up causing the controller to think it has a fault and to "let it rest" a moment before trying again.

Linux may simply be more resilient about dealing with failing hardware. There are different ideologies behind the operating systems and how Linux interacts with the hardware could well make it more tolerant of hardware that is spewing errors at it while Windows expects the hardware to just do its job.

Lastly:

Why does [insert drive error here] seem to deteriorate over time / useage

The same can be asked of any piece of equipment in the world, be it cars, trains, microchips or even biological organisms and the answer would always be along the lines of "because everything experiences some kind of deterioration over time."

We can't make frictionless bearings for motors, we can't make make a microprocessor that doesn't waste energy as heat and we can't make cells that live forever. The work that these items do costs them a tiny fraction of their lifespan a lot of the time it's just a guessing game as to which part in any given item will die first.

Related Question