Sparse files/file holes and unexpected block size

cfilesystemssparse-files

For my own learning, I've been playing around with creating files with file holes. I created a util that simply reads from stdin and writes to a file, but before writing to the file, it uses lseek to move beyond the end of file by a number of bytes.

fh -b 20000 testfile
hello
there

After starting this process, input can be entered ("hello"), and written to testfile, but before it does, it seeks past end of file by 20000 bytes. Then before writing, it seeks again past end of file another 20000 bytes before "there" is written.

What I'm not clear on is the number of blocks allocated to the newly created file. If I do

ls -ls testfile

it shows 8 blocks are allocated, and the file size is 40013 (which is expected).

A new file with 13 bytes (but no file holes) allocates 4 blocks according to ls -ls. I found out that this really means 1 block (2048 bytes for a block) but the blocks reported are divisible by 512 bytes. So presuming this is true, the math doesn't compute for the file with files holes. Why are 8 blocks allocated, shouldn't it still only be 4 since the physical file size is only 13 bytes (as opposed to logical size of 40013)?

I'm not sure if I'm reading the block size correctly, and secondly, I don't understand why the block size is 8 considering that a similar size file with no file holes has only 4.

I'm running Ubuntu 11.10 on an ext4 file system.

Best Answer

Ext4 can use 1kB, 2kB or 4kB as the block size; as far as I know the default on Ubuntu is 4kB. Note that here, a block is the size of a file chunk, which is constant for a given filesystem. The file you describe has two blocks that are not zeroes: the one containing hello (surrounded by a bunch of zeroes — 3616 before and 474 after), and the one containing here (preceded by a bunch of zeroes, and containing only 3148 bytes, after which the end of the file is reached). The total is two blocks of 4kB.

In the ls output, blocks are an arbitrary unit chosen by the ls command and defaulting to 1kB. There are 2 blocks of 4kB each allocated to contain file data, therefore the allocated size for the file is 8kB.

Your confusion may be due to two things. First, the figure of 2048 bytes for a block is possible, but it's not the default value under Ubuntu (or most modern distributions), and it's apparently not the value on your system. You can check the block size by running tune2fs -l /dev/sdz42 (use the actual path to your filesystem device).

Second, sparse files consist of not storing blocks that are entirely made of zeroes. If a block (which is of necessity aligned on a block size boundary, at least for most filesystems including ext4) contains zeroes and other things, then the full block is stored on the disk. Thus, in that 40012-byte file (how did you get to 40013, by the way), there are 4 all-zero non-stored blocks, then one stored block containing hello surrounded by zeroes, then 4 more all-zero non-stored blocks, and a final partial block containing zeroes and there.

Note that your utility can be written in terms of standard shell commands:

n=20000
while IFS= read -r line; do
  dd bs=1 seek=$n </dev/null
  echo "$line"
done >testfile
Related Question