Data Recovery – How to Find Lost Files After a ddrescue Attempt

data-recoveryddrescuehard-disk

I am in the process of salvaging data from a 1 TB failing drive (asked about it in Procedure to replace a hard disk?). I have done ddrescue from a system rescue USB with a resulting error size of 557568 B in 191 errors, probably all in /home (I assume what it calls "errors" are not bad sectors, but consecutive sequences of them).

Now, the several guides I've seen around suggest doing e2fsck on the new disk, and I expected this to somehow find that some files have been assigned "blank sectors/blocks", to the effect of at least knowing which files could not be saved whole. But no errors were found at all (I ran it without -y to make sure I didn't miss anything). Now I am running it again with -c, but at 95% no errors were found so far; I guess I have a new drive with some normal-looking files with zeroed or random pieces inside, undetectable until on day I open them with the corresponding software, or Linux Mint needs them.

Can I do anything with the old/new drives in order to obtain a list of possibly corrupted files? I don't know how many they could be, since that 191 could go across files, but at least the total size is not big; I am mostly concerned about a big bunch old family photos and videos (1+ MB each), the rest is probably irrelevant or was backed up recently.

Update: the new pass of e2fsck did give something new of which I understand nothing:

Block bitmap differences:  +231216947 +(231216964--231216965) +231216970 +231217707 +231217852 +(231217870--231217871) +231218486
Fix<y>? yes
Free blocks count wrong for group #7056 (497, counted=488).                    
Fix<y>? yes
Free blocks count wrong (44259598, counted=44259589).
Fix<y>? yes

Best Answer

You'll need the block numbers of all encountered bad blocks (ddrescue should have given you a list, I hope you saved it), and then you'll need to find out which files make use of these blocks (see e.g. here). You may want to script this if there are a lot of bad blocks.

e2fsck doesn't help, it just checks consistency of the file system itself, so it will only act of the bad blocks contain "adminstrative" file system information.

The bad blocks in the files will just be empty.

Edit

Ok, let's figure out the block size thingy. Let's make a trial filesystem with 512-byte device blocks:

$ dd if=/dev/zero of=fs bs=512 count=200
$ /sbin/mke2fs fs

$ ll fs
-rw-r--r-- 1 dirk dirk 102400 Apr 27 10:03 fs

$ /sbin/tune2fs -l fs
...
Block count:              100
...
Block size:               1024
Fragment size:            1024
Blocks per group:         8192
Fragments per group:      8192

So the filesystem block size is 1024, and we've 100 of those filesystem blocks (and 200 512-byte device blocks). Rescue it:

$ ddrescue -b512 fs fs.new fs.log
GNU ddrescue 1.19
Press Ctrl-C to interrupt
rescued:    102400 B,  errsize:       0 B,  current rate:     102 kB/s
   ipos:     65536 B,   errors:       0,    average rate:     102 kB/s
   opos:     65536 B, run time:       1 s,  successful read:       0 s ago
Finished                                     

$ cat fs.log
# Rescue Logfile. Created by GNU ddrescue version 1.19
# Command line: ddrescue fs fs.new fs.log
# Start time:   2017-04-27 10:04:03
# Current time: 2017-04-27 10:04:03
# Finished
# current_pos  current_status
0x00010000     +
#      pos        size  status
0x00000000  0x00019000  +

$ printf "%i\n" 0x00019000
102400

So the hex ddrescue units are in bytes, not any blocks. Finally, let's see what debugfs uses. First, make a file and find its contents:

$ sudo mount -o loop fs /mnt/tmp
$ sudo chmod go+rwx /mnt/tmp/
$ echo 'abcdefghijk' > /mnt/tmp/foo
$ sudo umount /mnt/tmp

$ hexdump -C fs
...
00005400  61 62 63 64 65 66 67 68  69 6a 6b 0a 00 00 00 00  |abcdefghijk.....|
00005410  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

So the byte address of the data is 0x5400. Convert this to 1024-byte filesystem blocks:

$ printf "%i\n" 0x5400
21504
$ expr 21504 / 1024
21

and let's also try the block range while we are at it:

$ /sbin/debugfs fs
debugfs 1.43.3 (04-Sep-2016)
debugfs:  testb 0
testb: Invalid block number 0
debugfs:  testb 1
Block 1 marked in use
debugfs:  testb 99
Block 99 not in use
debugfs:  testb 100
Illegal block number passed to ext2fs_test_block_bitmap #100 for block bitmap for fs
Block 100 not in use
debugfs:  testb 21
Block 21 marked in use
debugfs:  icheck 21
Block   Inode number
21      12
debugfs:  ncheck 12
Inode   Pathname
12      //foo

So that works out as expected, except block 0 is invalid, probably because the file system metadata is there. So, for your byte address 0x30F8A71000 from ddrescue, assuming you worked on the whole disk and not a partition, we subtract the byte address of the partition start

210330128384 - 7815168 * 512 = 206328762368

Divide that by the tune2fs block size to get the filesystem block (note that since multiple physical, possibly damaged, blocks make up a filesystem block, numbers needn't be exact multiples):

206328762368 / 4096 = 50373233.0

and that's the block you should test with debugfs.

Related Solutions

Data Recovery – Find if ext4 Block is in Inode Table and Extract from Journal

All right, so for the first question it turns out the debugfs stats command tells what the starting blocks for every section of a group are. In addition, I guessed that inumbers had to be consecutive and increasing, so basic addition of the offset into the inode table and the imap command gave me the first inumbers; it also confirmed my suspicion about the last bad sector, where my block group calculations indicated it was in the wrong group.

byte address  block      group  what                   first inumber
0x8B00020000  145752096  4448   inode table block 0    36438017
0x8B00027000  145752103  4448   inode table block 7    36438129
0x8B0002C000  145752108  4448   inode table block 12   36438209
0x8B00209000  145752585  4448   inode table block 489  36445841
0x8B0029A000  145752730  4449   inode table block 122  36448161

Since a block is 4096 bytes and each inode table entry is 256 bytes, there are 16 inodes per block. So I now have all 80 lost inode table entries by inumber.

Now let's turn to the journal. I wrote a small tool that dumps information in each block of the journal. Since the journal superblock was missing, there were two pieces of information that I needed for this that were lost:

whether the journal held 64-bit block numbers
whether the journal used version 3 checksums

Fortunately, if I forced one (or both) of these switches on, some of the descriptor blocks in the journal overflowed its block, proving that those flags were not set.

One awk script (fulllog.awk) later, I have a log of the form

0x0002A000 - descriptors
        0x0002B000 -> block 159383670
        0x0002C000 -> block 159383671
        0x0002D000 -> block 0
        0x0002E000 -> block 155189280
        0x0002F000 -> block 195559440
        0x00030000 -> block 47
        0x00031000 -> block 195559643
        0x00032000 -> block 195568036
        0x00033000 -> block 159383672
0x0002B000 - invalid/data block
0x0002C000 - invalid/data block
0x0002D000 - invalid/data block
0x0002E000 - invalid/data block
0x0002F000 - invalid/data block
0x00030000 - invalid/data block
0x00031000 - invalid/data block
0x00032000 - invalid/data block
0x00033000 - invalid/data block
0x00034000 - commit record
        commit time: 2014-12-25 16:53:13.703902604 -0500 EST

With this, another awk script (dumpallfor.awk) dumps all the blocks:

byte address  block      number of journaled blocks
0x8B00020000  145752096  6
0x8B00027000  145752103  10
0x8B0002C000  145752108  206
0x8B00209000  145752585  1
0x8B0029A000  145752730  0

So that last block is truly lost :( With any luck I can find out what files were there with debugfs's ncheck command.

So I have a bunch of blocks. And they all appear to differ! Now what?

I could go by the revocation records, but I can't seem to parse that structure meaningfully. I could go by the commit record timestamps, but before I try that, I want to see just how each inode table block differs. So I wrote another quick program (diff.go) to find that out.

For the most part, files that do differ differ only in timestamps, so we can just choose the file with the latest timestamps. We'll do that later. For all other files, we get this:

36438023 - size differs
36438139 - OSD1 (file version high dword) differs
36438209 - OSD1 differs

Hm, that's not good... The file with differing size will be a problem, and I have no idea what to do about the two OSD1 files. I also tried using debugfs's ncheck to see what the files were, but we don't have a match.

I then found out which block dumps have the latest timestamps for now (same repo, latest.go). The important thing to note is that I had the blocks scanned in chronological order by commit time. This is not necessarily the same as numerical order by block number; the journal is not always stored in chronologically increasing order.

As it turns out, however, the newest block (by commit time) is indeed the one with the latest timestamps!

Let's try these latest blocks and see if we can recover anything from them.

sudo dd if=BLOCKFILE of=DDRESCUEIMG bs=1 seek=BYTEOFFSET conv=notrunc

After that my home directory is back!

Now let's find out what those three differing files were...

Inode   Pathname
36438023    /pietro/.cache/gdm/session.log
36438209    /pietro/.config/liferea
36438139    /pietro/.local/share/zeitgeist/fts.index

The only important thing there is Liferea's configuration directory, but I don't think that was corrupted; it was one of the OSD1-differing ones.

And let's find out about those 16 inodes in the final block, the one that we could not recover:

Inode   Pathname
36448176    /pietro/k2
36448175    /pietro/Downloads/sOMe4P7.jpg
36448174    /pietro/Downloads/picture.png
36448164    /pietro/Downloads/tumblr_nfjvg292T21s4pk45o1_1280.png
36448169    /pietro/Downloads/METROID Super Zeromission v.2.3+HARD_v2.4.zip
36448165    /pietro/Downloads/tumblr_mrfex1kuxa1sbx6kgo1_500.jpg
36448173    /pietro/Downloads/1*-vuzP4JAoPf9S6ZdHNR_Jg.jpeg
36448162    /pietro/.cache/upstart/gnome-settings-daemon.log.6.gz
36448163    /pietro/.cache/upstart/dbus.log.7.gz
36448171    /pietro/.cache/upstart/gnome-settings-daemon.log.3.gz
36448161    /pietro/.local/share/applications/Knytt Underground.desktop
36448166    /pietro/Documents/Screenshots/Screenshot from 2014-12-03 15:47:29.png
36448170    /pietro/Documents/Screenshots/Screenshot from 2014-12-03 16:51:26.png
36448172    /pietro/Documents/Screenshots/Screenshot from 2014-12-03 19:08:54.png
36448168    /pietro/Documents/transactions/premiere to operating transaction 4305747926.pdf
36448167    /pietro/Documents/transactions/transaction 4315883542.pdf

In short:

a text file with only one or two things in that I could get back by brute force since I know that it has a date stamp and something that's also in my chat logs
some images downloaded from the internet; if I can't get the URLs back from Firefox's history then I can use photorec
a ROM hack that I can easily get on the Internet again =P
log files; no loss here
the .desktop file for a Steam game
screenshots; I can get these back with photorec assuming gnome-screenshot added the datestamp as metadata
bank account transaction records; if I can't get them from the bank I could probably use them with photorec

So not casualtyless but not a total loss, and I learned more about ext4 in the process. Thanks anyway!

UPDATE

Might as well put this out there:

NOT YET     /pietro/k2
FOUND       /pietro/Downloads/sOMe4P7.jpg
NOT YET     /pietro/Downloads/picture.png
FOUND       /pietro/Downloads/tumblr_nfjvg292T21s4pk45o1_1280.png
GOOGLEIT    /pietro/Downloads/METROID Super Zeromission v.2.3+HARD_v2.4.zip
FOUND       /pietro/Downloads/tumblr_mrfex1kuxa1sbx6kgo1_500.jpg
FOUND       /pietro/Downloads/1*-vuzP4JAoPf9S6ZdHNR_Jg.jpeg
UNNEEDED    /pietro/.cache/upstart/gnome-settings-daemon.log.6.gz
UNNEEDED    /pietro/.cache/upstart/dbus.log.7.gz
UNNEEDED    /pietro/.cache/upstart/gnome-settings-daemon.log.3.gz
UNNEEDED    /pietro/.local/share/applications/Knytt Underground.desktop
NOT YET     /pietro/Documents/Screenshots/Screenshot from 2014-12-03 15:47:29.png
NOT YET     /pietro/Documents/Screenshots/Screenshot from 2014-12-03 16:51:26.png
NOT YET     /pietro/Documents/Screenshots/Screenshot from 2014-12-03 19:08:54.png
NOT YET     /pietro/Documents/transactions/premiere to operating transaction 4305747926.pdf
NOT YET     /pietro/Documents/transactions/transaction 4315883542.pdf

And in case I'm not weird enough, the downloaded pictures were:

sOMe4P7.jpg (a parody of the Law & Order title card with "& KNUCKLES" added to it)
tumblr_nfjvg292T21s4pk45o1_1280.png (screenshot of this tweet from J. K. Rowling)
tumblr_mrfex1kuxa1sbx6kgo1_500.jpg (picture of a "Windows did not shut down successfully." error message on a billboard at what appears to be some sporting event)
1*-vuzP4JAoPf9S6ZdHNR_Jg.jpeg (this comic)

These were all shared by friends in chats.

I guess I'll keep this updated? (Not like it would make a difference...) I know I can recover everything; the only question is when =P

Which sector size shall I choose to run ddrescue with direct access on an Advanced Format drive

I've exchanged emails with the author of ddrescue, Antonio Diaz, and he told me that the correct parameter to use with an "advanced format" drive (i.e., a drive with 4096-byte physical sectors, but 512-byte "logical sectors") is:

 -b4096

If you wanted it to read just one 4096-byte sector at a time (slow!) then you would also specify:

-c1

Antonio is not active on StackExchange, but he supports ddrescue via this email mailing list:

https://www.mail-archive.com/bug-ddrescue@gnu.org/

If you send your email to bug-ddrescue@gnu.org then your email will appear on that summary page, as will his answer, in nicely organized form (but without your email address shown, of course). Additionally, you may search on that page to try to find previous discussions of your issue or question, before bothering Antonio. (He is a very busy man, so please don't waste his time!)

The reason that your ddrescue logfile contains a 512-byte "bad" area is that you initially ran ddrescue with the default sector size of 512 bytes. That's not disastrous, but if ddrescue thinks the drive has 512 byte sectors, and a read is issued that returns 0 bytes of data due to a read error, then ddrescue assumes that only the first of 512 bytes are unreadable, and makes no assumption about the rest. So only 512 bytes is marked as bad in the logfile.

Best Answer

Related Solutions

Data Recovery – Find if ext4 Block is in Inode Table and Extract from Journal

Which sector size shall I choose to run ddrescue with direct access on an Advanced Format drive

Related Question