Ext4 can use 1kB, 2kB or 4kB as the block size; as far as I know the default on Ubuntu is 4kB. Note that here, a block is the size of a file chunk, which is constant for a given filesystem. The file you describe has two blocks that are not zeroes: the one containing hello
(surrounded by a bunch of zeroes — 3616 before and 474 after), and the one containing here
(preceded by a bunch of zeroes, and containing only 3148 bytes, after which the end of the file is reached). The total is two blocks of 4kB.
In the ls
output, blocks are an arbitrary unit chosen by the ls
command and defaulting to 1kB. There are 2 blocks of 4kB each allocated to contain file data, therefore the allocated size for the file is 8kB.
Your confusion may be due to two things. First, the figure of 2048 bytes for a block is possible, but it's not the default value under Ubuntu (or most modern distributions), and it's apparently not the value on your system. You can check the block size by running tune2fs -l /dev/sdz42
(use the actual path to your filesystem device).
Second, sparse files consist of not storing blocks that are entirely made of zeroes. If a block (which is of necessity aligned on a block size boundary, at least for most filesystems including ext4) contains zeroes and other things, then the full block is stored on the disk. Thus, in that 40012-byte file (how did you get to 40013, by the way), there are 4 all-zero non-stored blocks, then one stored block containing hello
surrounded by zeroes, then 4 more all-zero non-stored blocks, and a final partial block containing zeroes and there
.
Note that your utility can be written in terms of standard shell commands:
n=20000
while IFS= read -r line; do
dd bs=1 seek=$n </dev/null
echo "$line"
done >testfile
Best Answer
Edit 2015
as of util-linux 2.25, the
fallocate
utility on Linux has a-d
/--dig-hole
option for that.Would dig a hole for every block full of zeros in the file
On older systems, you can do it by hand:
Linux has a
FALLOC_FL_PUNCH_HOLE
option tofallocate
that can do this. I found a script on github with an example:Using FALLOC_FL_PUNCH_HOLE from Python
I modified it a bit to do what you asked -- punch holes in regions of files that are filled with zeros. Here it is:
Using FALLOC_FL_PUNCH_HOLE from Python to punch holes in files
Example:
Note that
punch.py
only finds blocks of 4096 bytes to punch out, so it might not make a file exactly as sparse as it was when you started. It could be made smarter, of course. Also, it's only lightly tested, so be careful and make backups before trusting it!