Ext4 can use 1kB, 2kB or 4kB as the block size; as far as I know the default on Ubuntu is 4kB. Note that here, a block is the size of a file chunk, which is constant for a given filesystem. The file you describe has two blocks that are not zeroes: the one containing hello
(surrounded by a bunch of zeroes — 3616 before and 474 after), and the one containing here
(preceded by a bunch of zeroes, and containing only 3148 bytes, after which the end of the file is reached). The total is two blocks of 4kB.
In the ls
output, blocks are an arbitrary unit chosen by the ls
command and defaulting to 1kB. There are 2 blocks of 4kB each allocated to contain file data, therefore the allocated size for the file is 8kB.
Your confusion may be due to two things. First, the figure of 2048 bytes for a block is possible, but it's not the default value under Ubuntu (or most modern distributions), and it's apparently not the value on your system. You can check the block size by running tune2fs -l /dev/sdz42
(use the actual path to your filesystem device).
Second, sparse files consist of not storing blocks that are entirely made of zeroes. If a block (which is of necessity aligned on a block size boundary, at least for most filesystems including ext4) contains zeroes and other things, then the full block is stored on the disk. Thus, in that 40012-byte file (how did you get to 40013, by the way), there are 4 all-zero non-stored blocks, then one stored block containing hello
surrounded by zeroes, then 4 more all-zero non-stored blocks, and a final partial block containing zeroes and there
.
Note that your utility can be written in terms of standard shell commands:
n=20000
while IFS= read -r line; do
dd bs=1 seek=$n </dev/null
echo "$line"
done >testfile
The links you give explicitly state:
The st_blocks field indicates the number of blocks allocated to the file, 512-byte units.
So they're always in units of 512-byte blocks, regardless of what underlying device is used. The stat
command simply displays what the stat
system call returns. The 512-byte block is a historic thing, defined in POSIX. Compare for example these:
$ ls -s smallfile.txt
4 smallfile.txt
$ env POSIXLY_CORRECT=1 ls -s smallfile.txt
8 smallfile.txt
GNU ls
displays blocks by default in 1kB blocks, but when forced to comply with POSIX it shows 512-byte blocks.
Best Answer
I think I know how it works.
I connected another disk to my machine because it has a big almost empty partition ~458G . I checked its free space via
e2freefrag
:It's just a contiguous free blocks. So because the partition is almost empty, there's lots of free space and you have 228 chunks of 1-2G.
I placed a big 2,5G file inside of the partition, and the table above changed a little bit:
This doesn't tell anything about the allocated block extents, but it gave me some ideas. When I looked at the file in
e4defrag
, there was something like this:The number
32768
means blocks (4K), which equals to 128MiB. Some of them have fewer blocks and I don't know why because the filesystem is empty and I think all the extents should have 32768 blocks.Anyway I checked the main partition to see its free space, and there was something like this:
As you can see, there's no free contiguous blocks that could provide 128M (and more) space and that's why they've written on the wiki that you can have extents "up to" 128M.
I'm not sure why the file in question has 10 extents because there's still 16 chunks that are at least 32M.