Linux – File is thesteriously empty. Options to recover

data-recoveryfileslinux

I have seen several posts about recovering deleted files, but this situation is different. My wife had a file called Journal.odt in which she kept a lot of important personal information such as special memories about our kids. The other day when she tried to open it in OpenOffice it complained about the format. I had her hit cancel and back out. When I cat the file it is completely empty. ls says the file is 0 bytes.

Had she accidentally selected all of the text in the file, hit backspace and saved it there would still be the OpenOffice meta information in the file.

I immediately shut her laptop down to prevent making any more changes to disk until I can think of something to do.

I have done some complicated things in the past such as using dd to recover raw text off the disk but I have no idea what to do here. Since odt files aren't flat text I can't just pipe the whole disk through grep.

Any suggestions would be greatly appreciated.

Also if anyone has any insight as to what might have gone wrong I would love to hear it.

Thanks

Best Answer

If you are using ext3 file system try following Carlo Wood's HOWTO

In few words,

  • Use ext3grep $IMAGE --ls --inode 2 | grep your_file to find the file you are looking for (where $IMAGE is your partition, for example /dev/sda2; you'll need ext3grep)
  • Find the file system block that contains the journal of unallocated space.
  • Find all journal descriptors referencing block which were found previously.
  • Copy the block with dd.
  • Edit the file to delete the trailing zeroes.
  • cat the file wherever you want

From the source:

"The chapter Manual recovery example

In the following example we will manually recover a small file. Only partial output is given in order to save space and to make the example more readable.

Using ext3grep $IMAGE --ls --inode we find the name of the file that we want to recover:

$ ext3grep $IMAGE --ls --inode 2 | grep carlo 3 end d 195457 D 1202352103 Thu Feb 7 03:41:43 2008 drwxr-xr-x carlo

$ ext3grep $IMAGE --ls --inode 195457 | grep ' bin$' | head -n 1 34 35 d 309540 D 1202352104 Thu Feb 7 03:41:44 2008 drwxr-xr-x bin

$ ext3grep $IMAGE --ls --inode 309540 | grep start_azureus 9 10 r 309631 D 1202351093 Thu Feb 7 03:24:53 2008 rrwxr-xr-x start_azureus

Obviously, inode 309631 is erased and we have no block numbers for this file:

$ ext3grep $IMAGE --print --inode 309631 [...] Inode is Unallocated Group: 19 Generation Id: 2771183319 uid / gid: 1000 / 1000 mode: rrwxr-xr-x size: 0 num of links: 0 sectors: 0 (--> 0 indirect blocks).

Inode Times: Accessed: 1202350961 = Thu Feb 7 03:22:41 2008 File Modified: 1202351093 = Thu Feb 7 03:24:53 2008 Inode Modified: 1202351093 = Thu Feb 7 03:24:53 2008 Deletion time: 1202351093 = Thu Feb 7 03:24:53 2008

Direct Blocks:

Therefore, we will try to look for an older copy of it in the journal. First, we find the file system block that contains this inode:

$ ext3grep $IMAGE --inode-to-block 309631 | grep resides Inode 309631 resides in block 622598 at offset 0xf00.

Then we find all journal descriptors referencing block 622598:

$ ext3grep $IMAGE --journal --block 622598 [...] Journal descriptors referencing block 622598: 4381294 26582 4381311 28693 4381313 28809 4381314 28814 4381321 29308 4381348 30676 4381349 30986 4381350 31299 4381374 32718 4381707 1465 4381709 2132 4381755 2945 4381961 4606 4382098 6073 4382137 6672 4382138 7536 4382139 7984 4382140 8931

This means that the transaction with sequence number 4381294 has a copy of block 622598 in block 26582, and so on. The largest sequence number, at the bottom, should be the last data written to disk and thus block 8931 should be the same as the current block 622598. In order to find the last non-deleted copy, one should start at the bottom and work upwards.

If you try to print such a block, ext3grep recognizes that it's a block from an inode table and will print the contents of all 32 inodes in it. We only wish to see inode 309631 however; so we use a smart grep:

$ ext3grep $IMAGE --print --block 8931 | grep -A15 'Inode 309631' --------------Inode 309631----------------------- Generation Id: 2771183319 uid / gid: 1000 / 1000 mode: rrwxr-xr-x size: 0 num of links: 0 sectors: 0 (--> 0 indirect blocks).

Inode Times: Accessed: 1202350961 = Thu Feb 7 03:22:41 2008 File Modified: 1202351093 = Thu Feb 7 03:24:53 2008 Inode Modified: 1202351093 = Thu Feb 7 03:24:53 2008 Deletion time: 1202351093 = Thu Feb 7 03:24:53 2008

Direct Blocks:

This is indeed the same as we saw in block 622598. Next we look at smaller sequence numbers until we find one with a 0 Deletion time. The first one that we find (bottom up) is block 6073:

$ ext3grep $IMAGE --print --block 6073 | grep -A15 'Inode 309631' --------------Inode 309631----------------------- Generation Id: 2771183319 uid / gid: 1000 / 1000 mode: rrwxr-xr-x size: 40 num of links: 1 sectors: 8 (--> 0 indirect blocks).

Inode Times: Accessed: 1202350961 = Thu Feb 7 03:22:41 2008 File Modified: 1189688692 = Thu Sep 13 15:04:52 2007 Inode Modified: 1189688692 = Thu Sep 13 15:04:52 2007 Deletion time: 0

Direct Blocks: 645627

The above is automated and can be done much faster with the command line option --show-journal-inodes. This option will find the block that the inode belongs to, then finds all copies of that block in the journal, and subsequently prints only the requested inode from each of these block (each of which contains 32 inodes, as you know), eliminating duplicates:

$ ext3grep $IMAGE --show-journal-inodes 309631 Number of groups: 75 Minimum / maximum journal block: 1115 / 35026 Loading journal descriptors... done Journal transaction 4381435 wraps around, some data blocks might have been lost of this transaction. Number of descriptors in journal: 30258; min / max sequence numbers: 4379495 / 4382264 Copies of inode 309631 found in the journal:

--------------Inode 309631----------------------- Generation Id: 2771183319 uid / gid: 1000 / 1000 mode: rrwxr-xr-x size: 0 num of links: 0 sectors: 0 (--> 0 indirect blocks).

Inode Times: Accessed: 1202350961 = Thu Feb 7 03:22:41 2008 File Modified: 1202351093 = Thu Feb 7 03:24:53 2008 Inode Modified: 1202351093 = Thu Feb 7 03:24:53 2008 Deletion time: 1202351093 = Thu Feb 7 03:24:53 2008

Direct Blocks:

--------------Inode 309631----------------------- Generation Id: 2771183319 uid / gid: 1000 / 1000 mode: rrwxr-xr-x size: 40 num of links: 1 sectors: 8 (--> 0 indirect blocks).

Inode Times: Accessed: 1202350961 = Thu Feb 7 03:22:41 2008 File Modified: 1189688692 = Thu Sep 13 15:04:52 2007 Inode Modified: 1189688692 = Thu Sep 13 15:04:52 2007 Deletion time: 0

Direct Blocks: 645627

The file is indeed small: only one block. We copy this block with dd as shown before:

$ dd if=$IMAGE bs=4096 count=1 skip=645627 of=block.645627 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.0166104 seconds, 247 kB/s

and then edit the file to delete the trailing zeroes, or copy the first 40 bytes (the given size of the file):

$ dd if=block.645627 bs=1 count=40 of=start_azureus 40+0 records in 40+0 records out 40 bytes (40 B) copied, 0.000105397 seconds, 380 kB/s

$ cat start_azureus cd /usr/src/azureus/azureus ./azureus &

Recovered!"

Related Question