Converting sparse file to non-sparse in place

filessparse-files

On Linux, given a sparse file, how to make it non-sparse, in place?
It could be copied with cp --sparse=never ..., but if the file is say 10G and the hole is 2G
(that is the allocated space is 8G), how to make the filesystem allocate the remaining 2G without copying the original 8G to a new file?

Best Answer

On the face of it, it's a simple dd:

dd if=sparsefile of=sparsefile conv=notrunc bs=1M

That reads the entire file, and writes the entire contents back to it.

In order to only write the hole itself, you first have to determine where those holes are. You can do that using either filefrag or hdparm:

filefrag:

# filefrag -e sparsefile
Filesystem type is: 58465342
File size of sparsefile is 10737418240 (2621440 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0.. 1048575:  187357696.. 188406271: 1048576:            
   1:  1572864.. 2621439:  200704128.. 201752703: 1048576:  188406272: last,eof
sparsefile: 2 extents found

hdparm:

# hdparm --fibmap sparsefile

sparsefile:
 filesystem blocksize 4096, begins at LBA 0; assuming 512 byte sectors.
 byte_offset  begin_LBA    end_LBA    sectors
           0 1498861568 1507250175    8388608
  6442450944 1605633024 1614021631    8388608

This example file is, as you say, 10G in size with a 2G hole. It has two extents, the first covering 0-1048575, the second 1572864-2621439, which means that the hole is 1048576-1572864 (in 4k sized blocks, as shown by filefrag). The info shown by hdparm is the same, just displayed differently (first extent covers 8388608 512-byte sectors starting from 0 so it's 0-4294967295 bytes, so the hole is 4294967296-6442450944 in bytes.

Note that you may be shown considerably more extents anyway if there is any fragmentation. Unfortunately, neither command shows the holes directly, and I don't know one that does such, so you have to deduce it from the logical offsets shown.

Now, filling that 1048576-1572864 hole with dd as shown above, can be done by adding appropriate (identical) seek/skip values and count. Note that the bs= was adapted to use the 4k sectors as used by filefrag above. (For bs=1M, you'd have to adapt the seek/skip/count values to reflect 1M sized blocks).

dd if=sparsefile of=sparsefile conv=notrunc \
   bs=4k seek=1048576 skip=1048576 count=$((-1048576+1572864))

While you could fill holes with /dev/zero instead of reading the hole of the file itself (which will also just yield zeroes), it is safer to read from the sparsefile anyway so you won't corrupt your data in case you got an offset wrong.

In newer versions of GNU dd, you may stick to a larger blocksize and specify all values in bytes:

dd if=sparsefile of=sparsefile conv=notrunc bs=1M \
   iflag=skip_bytes,count_bytes oflag=seek_bytes \
   seek=4294967296 skip=4294967296 count=$((-4294967296+6442450944))

filefrag after running that:

# sync
# filefrag -e sparsefile 
Filesystem type is: 58465342
File size of sparsefile is 10737418240 (2621440 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0.. 1572863:  187357696.. 188930559: 1572864:            
   1:  1572864.. 2621439:  200704128.. 201752703: 1048576:  188930560: last,eof
sparsefile: 2 extents found

Due to fragmentation, it's still two extents. However, the logical offsets show that this time, there is no hole, so the file is no longer sparse.

Naturally, this dd solution is the very manual approach to things. If you need this on a regular basis, it would be easy to write a small program that fills such gaps. If it already exists as a standard tool, I haven't heard of it yet.


There is a tool after all, fallocate seems to work, after a fashion:

fallocate -l $(stat --format="%s" sparsefile) sparsefile

However at last in case of XFS, while it does allocate physical area for this file, it does not actually zero it out. filefrag shows such extents as allocated, but unwritten.

   2:        3..      15:    7628851..   7628863:     13:    7629020: unwritten

This is not good enough if the intent is to be able to read the correct data directly from the block device. It only reserves the storage space needed for future writes.

Related Question