Ext4 can use 1kB, 2kB or 4kB as the block size; as far as I know the default on Ubuntu is 4kB. Note that here, a block is the size of a file chunk, which is constant for a given filesystem. The file you describe has two blocks that are not zeroes: the one containing hello
(surrounded by a bunch of zeroes — 3616 before and 474 after), and the one containing here
(preceded by a bunch of zeroes, and containing only 3148 bytes, after which the end of the file is reached). The total is two blocks of 4kB.
In the ls
output, blocks are an arbitrary unit chosen by the ls
command and defaulting to 1kB. There are 2 blocks of 4kB each allocated to contain file data, therefore the allocated size for the file is 8kB.
Your confusion may be due to two things. First, the figure of 2048 bytes for a block is possible, but it's not the default value under Ubuntu (or most modern distributions), and it's apparently not the value on your system. You can check the block size by running tune2fs -l /dev/sdz42
(use the actual path to your filesystem device).
Second, sparse files consist of not storing blocks that are entirely made of zeroes. If a block (which is of necessity aligned on a block size boundary, at least for most filesystems including ext4) contains zeroes and other things, then the full block is stored on the disk. Thus, in that 40012-byte file (how did you get to 40013, by the way), there are 4 all-zero non-stored blocks, then one stored block containing hello
surrounded by zeroes, then 4 more all-zero non-stored blocks, and a final partial block containing zeroes and there
.
Note that your utility can be written in terms of standard shell commands:
n=20000
while IFS= read -r line; do
dd bs=1 seek=$n </dev/null
echo "$line"
done >testfile
Some quick answers: first, you didn't create a sparse file. Try these extra commands
dd if=/tmp/BIL of=/tmp/sparse seek=1000
ls -ls /tmp/sparse
You will see the size is 512003 bytes, but only takes 8 blocks. The null bytes have to occupy a whole block, and be on a block boundary for them to be possibly sparse in the filesystem.
Why does the second occurrence of "BIL" appear out of order?
because you are on a little-endian system and you are writing output in shorts. Use bytes, like cat does.
How does cat and other tools know to print in the correct order?
they work on bytes.
How do programs like ls discern between the "alleged" size and the allocated size?
ls
and so on use the stat(2)
system call which returns 2 values:
st_size; /* total size, in bytes */
blkcnt_t st_blocks; /* number of 512B blocks allocated */
What tools can I use to interrogate inode information?
stat is good.
Is there a tool where I can walk the direct and indirect blocks?
On ext2/3/4 you can use hdparm --fibmap
with the filename:
$ sudo hdparm --fibmap ~/sparse
filesystem blocksize 4096, begins at LBA 25167872; assuming 512 byte sectors.
byte_offset begin_LBA end_LBA sectors
512000 226080744 226080751 8
You can also use debugfs
:
$ sudo debugfs /dev/sda3
debugfs: stat <1040667>
Inode: 1040667 Type: regular Mode: 0644 Flags: 0x0
Generation: 1161905167 Version: 0x00000000
User: 127 Group: 500 Size: 335360
File ACL: 0 Directory ACL: 0
Links: 1 Blockcount: 664
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x4dd61e6c -- Fri May 20 09:55:24 2011
atime: 0x4dd61e29 -- Fri May 20 09:54:17 2011
mtime: 0x4dd61e6c -- Fri May 20 09:55:24 2011
Size of extra inode fields: 4
BLOCKS:
(0-11):4182714-4182725, (IND):4182726, (12-81):4182727-4182796
TOTAL: 83
Why does dd truncate my file and can dd or another tool write into the middle of a file?
Yes, dd
can write into the middle. Add conv=notrunc
.
Are there mechanisms to prevent sparse files be shrunk/grown? And if not, why are sparse files useful?
No. Because they take less space.
The sparse aspect of a file should be totally transparent to a program, which sometimes means the sparseness may be lost when the program updates a file.
Some copying utilities have options to preserve sparseness, eg tar --sparse
, rsync --sparse
.
Note, you can explicitly convert the suitably aligned zero blocks in a file to sparseness by using cp --sparse=always
and the reverse, converting sparse space into real zeros, with cp --sparse=never
.
Best Answer
I haven't tested it, but there is a
write-devices
patch to rsync, which would solve your problem. You can find the patch in the rsync-patches repository.