Imagine a file created with:
truncate -s1T file
echo test >> file
truncate -s2T file
I now have a 2 tebibyte file (that occupies 4kiB on disk), with "test\n"
written in the middle.
How would I recover that "test"
efficiently, that is without having to read the whole file.
tr -d '\0' < file
Would give me the result but that would take hours.
What I'd like is something that outputs only the non-sparse parts of the file (so above only "test\n"
or more likely, the 4kiB block allocated on disk that stores that data).
There are APIs to find out which part of the file are allocated (FIBMAP, FIEMAP, SEEK_HOLE, SEEK_DATA…), but what tools expose those?
A portable solution (at least to the OSes that support those APIs) would be appreciated.
Best Answer
The best I could come up with so far is (ksh93, using
filefrag
frome2fsprogs
1.42.9 (some older versions have a different API), on extent based file systems on Linux):filefrag
reports the extents of the file using the FIEMAP ioctl for the filesystems that support it.The
*unwritten*
part covers for the (non-sparse, but still full of zeros I'm not interested in) files that have beenfallocated
but not written to.Recent versions of
bsdtar
orstar
can use some of those APIs to generate atar
file that identifies the sparse sections as such. That would make for a more portable solution, but then one would have to parse the generated tar file to get the non-sparse sections.