Understanding sparse files, dd, seek, inode block structure

ddfilesystemsinodesparse-files

At work we use sparse files as part of out Oracle VM environment for the guest disk images. After some questions from a colleague (which have since been answered) I am left with more questions about sparse files, and perhaps more widely about inode structure – reading the man pages of stat(2) and statfs(2) (on FreeBSD) I get the impression that I'd understand more readily if I knew more C, but alas my knowledge of C is minimal at best…

I understand that some of this is dependent on file system type. I'm mostly interested UFS on FreeBSD/Solaris and ext4 – ZFS would be a plus but I'm not going to hold out hope 🙂
I am using Solaris 10, FreeBSD 10.3, and CentOS 6.7 regularly. The commands here are being run on a CentOS 6.7 VM, but have been cross referenced with FreeBSD.
If possible, I'm interested in gaining an understanding from a POSIX viewpoint, and favouring FreeBSD over Linux if that isn't possible.

Consider the following set of commands:

printf "BIL" > /tmp/BIL

dd of=/tmp/sparse bs=1 count=0 seek=10
dd if=/tmp/BIL of=/tmp/sparse bs=1 count=3 seek=10

dd if=/tmp/BIL of=/tmp/sparse bs=1 count=3 seek=17

dd of=/tmp/sparse bs=1 count=0 seek=30
dd if=/tmp/BIL of=/tmp/sparse bs=1 count=3 seek=30

The file /tmp/BIL should have the contents (in hex) of 4942 004c, so when I hexdump the file /tmp/sparse I should see a smattering of this combination throughout:

%>hexdump sparse
0000000 0000 4942 004c 0000 0000 4942 004c 0000
0000010 4200 4c49 0000 0000 0000 0000 0000 4942
0000020 004c
0000021

%>cat sparse
BILBILBILBIL%

1. Why does the second occurrence of "BIL" appear out of order? i.e. 4200 4c49 rather than 4942 004c? This was written by the third dd command.

2. How does cat and other tools know to print in the correct order?

Using ls we can see the space allegedly used and the blocks allocated:

%>ls -ls /tmp/sparse
8.0K -rw-r--r--. 1 bil bil 33 May 26 14:17 /tmp/sparse

We can see that the alleged size is 33 bytes, but allocated size is 8 kilobytes (file system block size is 4K).

3. How do programs like ls discern between the "alleged" size and the allocated size?

I wondered if the "alleged" figure stored in the inode while the allocated size was calculated by walking the direct and indirect blocks – though this cannot be correct since calculation via walking would take time and tools such as ls return quickly, even for very large files.

4. What tools can I use to interrogate inode information?

I know of stat, but it doesn't seem to print out the values of all of the fields in an inode…

5. Is there a tool where I can walk the direct and indirect blocks?

It would be interesting to see each address on disk, and the contents to gain a bit more understanding of how data is stored

If I run the following command after the others above, the file /tmp/sparse is truncated:

%>dd of=/tmp/sparse bs=1 count=0 seek=5
%>hexdump sparse
0000000 0000 4942 004c
0000005

6. Why does dd truncate my file and can dd or another tool write into the middle of a file?

Lastly, sparse files seem like a Good Idea for preallocating space, but there doesn't appear to be file system or operating system level assurances that the a command won't truncate or arbitrarily grow the file.

7. Are there mechanisms to prevent sparse files be shrunk/grown? And if not, why are sparse files useful?

While each question above could possibly be a separate SO question, I cannot dissect them as they are all related to the underlying understanding.

Best Answer

Some quick answers: first, you didn't create a sparse file. Try these extra commands

dd if=/tmp/BIL of=/tmp/sparse seek=1000
ls -ls /tmp/sparse

You will see the size is 512003 bytes, but only takes 8 blocks. The null bytes have to occupy a whole block, and be on a block boundary for them to be possibly sparse in the filesystem.

Why does the second occurrence of "BIL" appear out of order?

because you are on a little-endian system and you are writing output in shorts. Use bytes, like cat does.
How does cat and other tools know to print in the correct order?

they work on bytes.
How do programs like ls discern between the "alleged" size and the allocated size?

ls and so on use the stat(2) system call which returns 2 values:
```
st_size;             /* total size, in bytes */ 
blkcnt_t  st_blocks; /* number of 512B blocks allocated */
```
What tools can I use to interrogate inode information?

stat is good.

Is there a tool where I can walk the direct and indirect blocks?

On ext2/3/4 you can use hdparm --fibmap with the filename:

$ sudo hdparm --fibmap ~/sparse 
filesystem blocksize 4096, begins at LBA 25167872; assuming 512 byte sectors.
byte_offset  begin_LBA    end_LBA    sectors
     512000  226080744  226080751          8

You can also use debugfs:

$ sudo debugfs /dev/sda3
debugfs:  stat <1040667>
Inode: 1040667   Type: regular    Mode:  0644   Flags: 0x0
Generation: 1161905167    Version: 0x00000000
User:   127   Group:   500   Size: 335360
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 664
Fragment:  Address: 0    Number: 0    Size: 0
ctime: 0x4dd61e6c -- Fri May 20 09:55:24 2011
atime: 0x4dd61e29 -- Fri May 20 09:54:17 2011
mtime: 0x4dd61e6c -- Fri May 20 09:55:24 2011
Size of extra inode fields: 4
BLOCKS:
(0-11):4182714-4182725, (IND):4182726, (12-81):4182727-4182796
TOTAL: 83

Why does dd truncate my file and can dd or another tool write into the middle of a file?

Yes, dd can write into the middle. Add conv=notrunc.
Are there mechanisms to prevent sparse files be shrunk/grown? And if not, why are sparse files useful?

No. Because they take less space.

The sparse aspect of a file should be totally transparent to a program, which sometimes means the sparseness may be lost when the program updates a file.

Some copying utilities have options to preserve sparseness, eg tar --sparse, rsync --sparse.

Note, you can explicitly convert the suitably aligned zero blocks in a file to sparseness by using cp --sparse=always and the reverse, converting sparse space into real zeros, with cp --sparse=never.

Related Solutions

Sparse files/file holes and unexpected block size

Ext4 can use 1kB, 2kB or 4kB as the block size; as far as I know the default on Ubuntu is 4kB. Note that here, a block is the size of a file chunk, which is constant for a given filesystem. The file you describe has two blocks that are not zeroes: the one containing hello (surrounded by a bunch of zeroes — 3616 before and 474 after), and the one containing here (preceded by a bunch of zeroes, and containing only 3148 bytes, after which the end of the file is reached). The total is two blocks of 4kB.

In the ls output, blocks are an arbitrary unit chosen by the ls command and defaulting to 1kB. There are 2 blocks of 4kB each allocated to contain file data, therefore the allocated size for the file is 8kB.

Your confusion may be due to two things. First, the figure of 2048 bytes for a block is possible, but it's not the default value under Ubuntu (or most modern distributions), and it's apparently not the value on your system. You can check the block size by running tune2fs -l /dev/sdz42 (use the actual path to your filesystem device).

Second, sparse files consist of not storing blocks that are entirely made of zeroes. If a block (which is of necessity aligned on a block size boundary, at least for most filesystems including ext4) contains zeroes and other things, then the full block is stored on the disk. Thus, in that 40012-byte file (how did you get to 40013, by the way), there are 4 all-zero non-stored blocks, then one stored block containing hello surrounded by zeroes, then 4 more all-zero non-stored blocks, and a final partial block containing zeroes and there.

Note that your utility can be written in terms of standard shell commands:

n=20000
while IFS= read -r line; do
  dd bs=1 seek=$n </dev/null
  echo "$line"
done >testfile

What is inode for, in FreeBSD or Solaris

An inode is a structure in some file systems that holds a file or directory's metadata (all the information about the file, except its name and data). It holds information about permissions, ownership, creation and modification times, etc.

Systems the offer a virtualised file system access layer (FreeBSD, Solaris, Linux), can support different underlying file systems which may or may not utilise inodes. ReiserFS, for example, doesn't use them, whereas FreeBSD's ffs2 does. The abstraction layer through which you access the file system provides a single and well-defined interface for file operations, so that applications don't need to know about the differences between different file system implementations.

Best Answer

Related Solutions

Sparse files/file holes and unexpected block size

What is inode for, in FreeBSD or Solaris

Related Question