I think you're confused, possibly because you've read several documents that use different terminology. Terms like “block size” and “cluster size” don't have a universal meaning, even within the context of filesystem literature.
Filesystems
For ext2 or ext3, the situation is relatively simple: each file occupies a certain number of blocks. All blocks on a given filesystem have the same size, usually one of 1024, 2048 or 4096 bytes. A file¹ whose size is between N blocks plus one byte and N+1 blocks occupies N+1 blocks. That block size is what you specify with mke2fs -b
. There is no separate notion of clusters.
The FAT filesystem used in particular by MS-DOS and early versions of Windows has a similarly simple space allocation. What ext2 calls blocks, FAT calls clusters; the concept is the same.
Some filesystems have a more sophisticated allocation scheme: they have fixed-size blocks, but can use the same block to store the last few bytes of more than one file. This is known as block suballocation; Reiserfs and Btrfs do it, but not ext3 or even ext4.
Utilities
Unix utilities often use the word “block” to mean an arbitrarily-sized unit, typically 512 bytes or 1kB. This usage is unrelated to any particular filesystem or disk hardware. Historically, the 512B block did come about because disks and filesystems at the time often operated in 512B chunks, but the modern usage is just arbitrary. Traditional unix utilities and interfaces still use 512B blocks sometimes, though 1kB blocks are now often preferred. You need to check the documentation of each utility to know what size of block it's using (some have a switch, e.g. du -B
or df -B
on Linux).
In the GNU/Linux stat
utility, the blocks
figure is the number of 512B blocks used by the file. The IO Block
figure is the preferred size for file input-output, which is in principle unrelated but usually an indication of the underlying filesystem's block size (or cluster size if that's what you want to call it). Here, you have a 13-byte file, which is occupying one block on the ext3 filesystem with a block size of 2048; therefore the file occupies 4 512-byte units (called “blocks” by stat
).
Disks
Most disks present an interface that shows the disk as a bunch of sectors. The disk can only write or read a whole sector, not individual bits or bytes. Most hard disks have 512-byte sectors, though 4kB-sector disks started appearing a couple of years ago.
The disk sector size is not directly related to the filesystem block size, but having a block be a whole number of sectors is better for performance.
¹
Exception: sparse files save space .
Ext4 can use 1kB, 2kB or 4kB as the block size; as far as I know the default on Ubuntu is 4kB. Note that here, a block is the size of a file chunk, which is constant for a given filesystem. The file you describe has two blocks that are not zeroes: the one containing hello
(surrounded by a bunch of zeroes — 3616 before and 474 after), and the one containing here
(preceded by a bunch of zeroes, and containing only 3148 bytes, after which the end of the file is reached). The total is two blocks of 4kB.
In the ls
output, blocks are an arbitrary unit chosen by the ls
command and defaulting to 1kB. There are 2 blocks of 4kB each allocated to contain file data, therefore the allocated size for the file is 8kB.
Your confusion may be due to two things. First, the figure of 2048 bytes for a block is possible, but it's not the default value under Ubuntu (or most modern distributions), and it's apparently not the value on your system. You can check the block size by running tune2fs -l /dev/sdz42
(use the actual path to your filesystem device).
Second, sparse files consist of not storing blocks that are entirely made of zeroes. If a block (which is of necessity aligned on a block size boundary, at least for most filesystems including ext4) contains zeroes and other things, then the full block is stored on the disk. Thus, in that 40012-byte file (how did you get to 40013, by the way), there are 4 all-zero non-stored blocks, then one stored block containing hello
surrounded by zeroes, then 4 more all-zero non-stored blocks, and a final partial block containing zeroes and there
.
Note that your utility can be written in terms of standard shell commands:
n=20000
while IFS= read -r line; do
dd bs=1 seek=$n </dev/null
echo "$line"
done >testfile
Best Answer
Many disks have a sector size of 512 bytes, meaning that any read or write on the disk transfers a whole 512-byte sector at a time. It is quite natural to design filesystems where a sector is not split between files (that would complicate the design and hurt performance); therefore filesystems tend to use 512-byte chunks for files. Hence traditional utilities such as
ls
anddu
indicate sizes in units of 512-byte chunks.For humans, 512-byte units are not very meaningful. 1kB is the same order of magnitude and a lot more meaningful. A filesystem block (the smallest unit that a file is divided in) actually often consists of several sectors: 1kB, 2kB and 4kB are common filesystem block sizes; so the 512-byte unit is not strongly justified by the filesystem design, and there is no good reason other than tradition to use a 512-byte unit outside a disk driver at all.
So you have a tradition that doesn't have a lot going for it, and a more readable convention that's taking on. A bit like octal and hexadecimal: there isn't one that's right and one that's wrong, they're different ways of writing the same numbers.
Many tools have an option to select display units:
ls --block-size=512
for GNUls
, settingPOSIXLY_CORRECT=1
in the environment for GNUdf
and GNUdu
to get 512-byte units (or passing-k
to force 1kB units). What thestat
command in GNU coreutils exposes as the “block size” (the%B
value) is an OS-dependant value of an internal interface; depending on the OS, it may or may not be related to a size used by the filesystem or disk code (it usually isn't — see Difference between block size and cluster size). On Linux, the value is 512, regardless of what any underlying driver is doing. The value of%B
never matters, it's just a quirk that it exists at all.