How to recover a lost pdf file

data-recoveryfilespdf

I accidentally lost a pdf file during the following process

  • I was running a pdf software application PDFXCView in Wine in Ubuntu 18.04, to open a pdf file in a ext4 filesystem.

  • Then I mv the pdf file somewhere else.

  • Then I edited the pdf file already opened in PDFXCView. When I tried to save the edited file, I had to choose "save as…" to locate the current path of the file and attempted to overwrite it. But PDFXCView failed to overwrite the file, furthermore made it disappear and then aborted
    .

Here are some attempts.

  1. If it can be helpful, I remember the pathname of the lost pdf file.

  2. I couldn't backup the partition of the filesystem by dd, since I
    don't have an additional hard drive with big enough capacity for the
    partition.

  3. I tried debugfs according to
    https://unix.stackexchange.com/a/80285,

     $ sudo debugfs -w /dev/sda4
     debugfs: lsdel
    
     Inode  Owner  Mode    Size      Blocks   Time deleted
    22549259   1000 100600    141      1/     1 Sat Apr  2 09:14:06 2016
    1 deleted inodes found.
    
    debugfs:  logdump -i 22549259
    22549259: File not found by ext2_lookup    
    

    The file was just lost instead of being deleted in 2016, so I am not
    sure if it found the correct inode.

  4. I saw in https://unix.stackexchange.com/a/98700/ that says using

    grep -a -C 500 'known pattern' /dev/sda | tee /tmp/recover
    

    to recover a text file which contains a known pattern.

    A while ago, I created the lost pdf file by concatenating several
    smaller pdf files using pdftk and I still have those smaller
    files. From one smaller pdf file, I can see the binary content of a smaller pdf file by cat smaller.pdf | less, which contains a readable pdf format specific string

    /URI (http://flask.pocoo.org/docs/1.0/api/#flask.Flask.logger)
    

    So I tried:

    sudo grep -a -C 500 'http://flask.pocoo.org/docs/1.0' /dev/sda4 >  /tmp/test/recover
    

    Because those small files and the lost file both contain the string, and -C 500 is too arbitrary to specify the begin and end of a file. I am not sure it can produce useful results.

I was wondering what ways I may try to recover the pdf file?

Thanks!

Best Answer

Definitely start with leaving the partition with the data alone, if at all possible (you would be surprised what you can recover even a month later if it is not your main system partition). Then proceed with foremost (I originally mentioned magicrescue but foremost performs just as well, yet it has a ready receipe for pdf

sudo apt update && sudo apt install foremost
sudo foremost -v -t pdf -i [PATH] -o ~/pdfrecovery/

# -t - Filetype [in our case pdf]
# -i - Input file [can be as wide as /dev/sdX or more detailed]
# -o - Output Directory

I just ran it for a few seconds on one of my /dev/sdX drives and pulled 370 pdf files. The files will have no original names and will look like this: 14348984.pdf so the -i flag is pretty important.

Good luck.


Update

Your second option is testdisk/photorec which in your case may be easier when dealing with the known path. testdisk and photorec do have some caveats that if not careful (and happen to confirm multiple dialogs asking if you want to apply changes) can lead to disk damage, but it you take it slow, it may be more appropriate, and faster as it will likely show you a good folder tree structure with a node corresponding to your missing file. If you do not find your file with foremost in let's say 2 hours, post a comment and I will provide a secondary testdisk approach.

Update 2

When I just tested it, testdisk crushed foremost in terms of locating a specific deleted file. It preserved the folder tree and filename structure perfectly, thus limiting the time spent creating every *.pdf file. The two approaches differ substantially, and if the file is very important, I would definitely use both testdisk and foremost to locate the same file to be sure I end up with a full non-corrupted file.

Related Question