I have a hard disk that was formatted and reinstalled its OS.
The problem is, it wasn't booting before formatting and the data backup that I've made before formatting, for some reason, don't have all the files.
There are Microsoft Word *.docx files missing.
Now I'm trying to recover the files with Puran File Recovery but it doesn't have a *.docx extension scan entry pre-built in it.
Puran File Recovery has an option to we create custom entries and I found in filesignatures.net the start bytes signature, so now I was able to find many *.docx headers in the hard disk.
My problem now is that I can't find anywhere what are the end bytes of *.docx files so that I might be able to recover some files.
Best Answer
A
.docx
file is just a.zip
file. This is how a Zip file is structured:The end of a Zip file is indicated by the end of central directory record (EOCD). The length of the EOCD is variable because it can contain a comment up to 65535 bytes long. See the bold part of the EOCD layout below:
Table from Wikipedia » Zip (file format) » End of central directory record (EOCD)
You can get the end of a Zip file by looking for
0x06054b50
(the beginning of the EOCD), then counting 16 bytes after that. Set the next two bytes to0x0000
to ignore the comment, and you should now have the end of a valid Zip file.Note: This does not take file system fragmentation into account. Your recovery approach will not work if the
.docx
/.zip
file was fragmented on the disk because the signatures you're finding would be broken up. You would need some information from the file system in order to piece together fragmented files; beginning and end signatures don't have this information.PhotoRec is a software I've used before that has some tricks to figure out how to piece together fragmented files. Crucially for you, PhotoRec has built-in support for Zip files, so you might want to try TestDisk/PhotoRec if your current signature search strategy isn't working for you.