Hard-Disk – How to Ignore Write Errors While Zeroing a Disk

ddddrescuehard-disk

Say you want to zero-out a failing hard disk. You want to overwrite as much as possible with zeros. What you don't want is: the process aborts on the first write-error. How to do that?

AFAICS, plain dd only provides an option for ignoring read errors. Thus, something like

dd if=/dev/zero of=/dev/disk/by-id/lousy-vendor-123 bs=128k

is not enough.

ddrescue seems to be better at ignoring errors – but what would be the optimal command line with it?

My try with GNU ddrescue:

ddrescue --verbose --force --no-split /dev/zero /dev/disk/by-id/lousy-vendor-123

Best Answer

I prefer badblocks in destructive write mode for this. It writes, it continues doing so when it hits errors, and finally it tells you where those errors were, and this information may help you decide what to do next (Will It Blend?).

# badblocks -v -b 4096 -t random -o badblocks.txt -w /dev/destroyme
Checking for bad blocks in read-write mode
From block 0 to 2097151
Testing with random pattern: done
Reading and comparing: done
Pass completed, 52105 bad blocks found. (0/52105/0 errors)

And the block list:

# head badblocks.txt
2097000
2097001
2097002
2097003
2097004

And what's left on the disk afterwards:

# hexdump -C /dev/destroyme
00000000  be e9 2e a5 87 1d 9e 61  e5 3c 98 7e b6 96 c6 ed  |.......a.<.~....|
00000010  2c fe db 06 bf 10 d0 c3  52 52 b8 a1 55 62 6c 13  |,.......RR..Ubl.|
00000020  4b 9a b8 d3 b7 57 34 9c  93 cc 1a 49 62 e0 36 8e  |K....W4....Ib.6.|

Note it's not really random data - the pattern is repetitive, so if you skipped 1MiB you'd see the same output again.

It will also try to verify by reading the data back in, so if you have a disk that claims to be writing successfully but returns wrong data on readback, it will find those errors too. (Make sure no other processes write to the disk while badblocks is running to avoid false positives.)

Of course with a badly broken disk this may take too long: there is no code that would make it skip over defective areas entirely. The only way you could achieve that with badblocks would be using a much larger blocksize.

I'm not sure if ddrescue does this any better; it's supposed to do that in the other direction (recover as much data as fast as possible). You can do it manually for dd/ddrescue/badblocks by specifying first/last block...

Related Solutions

How to estimate loops/time for completion of GNU ddrescue (1.18.1) using current status

Even though the question was asked 10 months ago, the answer might be relevant because the recovery cycle might still be running depending on a few factors! No pun intended.

The reason is that, time estimate is almost impossible, however sometimes you could get a rough idea as follows. One of the most obvious reasons is that you can't predict how long it will take the drive to read a bad sector and if you want ddrescue to read and retry every single one, then it could take a very long time. For example, I'm currently running a recovery on a small 500GB drive that's been going on for over 2 weeks and I possibly have a few days left. But mine is a more complicated situation because the drive is encrypted and to read anything successfully, I have make sure to get all sectors that have partition tables, boot sectors and other important parts of the disk. I'm using techniques in addition to ddrescue to improve my chances for all the bad sectors. IOW, your unique situation is important in determining time to completion.

By estimate of "loops", if you mean number of retries then that's something you determine by the parameters you use. If you mean "total number of passes", that's easily determined by reading about the algorithm here.. >man ddrescue< / Algorithm: How ddrescue recovers the data

I'll specifically speak to the numbers in the screen captures you provided. Other situations may have other factors that apply, so take this information as a general guideline.

In the sample you've provided take a look at ddrescue's running status screen. We get the total "estimate" of the problem (rescue domain) by "errsize". This is the amount of data that is "yet to be read". In the sample it is 345GB. Next line below to the right is "average rate". In the sample it is 583kb/s

If the "average rate" was to remain close to steady, this means you have 7 more days to go. 345 GB / (583 kb * 60*60*24) = 7.18 However the problem is that you can't rely on the 583kb/s. In fact deeper you go into recovery, the drive gets slower since it's reading more and more tougher areas and is doing more retries. So the time to finish exponentially increases. All of this depends on how badly the drive is damaged.

The sample you've provided shows a "successful read" was over 10 hours ago. That's saying that it's not really getting anything from the drive for 10+ hours. This shows that your drive may have 345GB worth (or a portion) of data shot. This is very bad news for you.

In contrast, my second 500GB drive that had just started giving me "S.M.A.R.T" errors, was copied disk to disk (with log file on another drive) and the whole operation took about 8-9 hours. The last part of it slowed down. But that's still bearable. While the very bad drive, as noted above is well past 2 weeks working on 500GB and still has about 4-5 % remaining to recover.

HTH and YMMV

mdadm – Rebuilding IMSM RAID-0 Array from Disk Images Using mdadm

Looking at the partition table for /dev/loop0 and the disk image sizes reported for /dev/loop0 and /dev/loop1, I'm inclined to suggest that the two disks were simply bolted together and then the partition table was built for the resulting virtual disk:

Disk /dev/loop0: 298.1 GiB, 320072933376 bytes, 625142448 sectors

Device       Boot   Start        End    Sectors   Size Id Type
/dev/loop0p1 *       2048    4196351    4194304     2G  7 HPFS/NTFS/exFAT
/dev/loop0p2      4196352 1250273279 1246076928 594.2G  7 HPFS/NTFS/exFAT

and

Disk /dev/loop1: 298.1 GiB, 320072933376 bytes, 625142448 sectors

If we take the two disks at 298.1 GiB and 298.1 GiB we get 596.2 GiB total. If we then take the sizes of the two partitions 2G + 594.2G we also get 596.2 GiB. (This assumes the "G" indicates GiB.)

You have already warned that you cannot get mdadm to recognise the superblock information, so purely on the basis of the disk partition labels I would attempt to build the array like this:

mdadm --build /dev/md0 --raid-devices=2 --level=0 --chunk=128 /dev/loop0 /dev/loop1
cat /proc/mdstat

I have a chunk size of 128KiB to match the chunk size described by the metadata still present on the disks.

If that works you can then proceed to access the partition in the resulting RAID0.

ld=$(losetup --show --find --offset=$((4196352*512)) /dev/md0)
echo loop device is $ld
mkdir -p /mnt/dsk
mount -t ntfs -o ro $ld /mnt/dsk

We already have a couple of loop devices in use, so I've avoided assuming the name of the next free loop device and instead asked the losetup command to tell me the one it's used; this is put into $ld. The offset of 4196532 sectors (each of 512 bytes) corresponds to the offset into the image of the second partition. We could equally have omitted the offset from the losetup command and added it to the mount options.

Best Answer

Related Solutions

How to estimate loops/time for completion of GNU ddrescue (1.18.1) using current status

mdadm – Rebuilding IMSM RAID-0 Array from Disk Images Using mdadm

Related Question