Understanding sparse files, dd, seek, inode block structure

ddfilesystemsinodesparse-files

At work we use sparse files as part of out Oracle VM environment for the guest disk images. After some questions from a colleague (which have since been answered) I am left with more questions about sparse files, and perhaps more widely about inode structure – reading the man pages of stat(2) and statfs(2) (on FreeBSD) I get the impression that I'd understand more readily if I knew more C, but alas my knowledge of C is minimal at best…

I understand that some of this is dependent on file system type. I'm mostly interested UFS on FreeBSD/Solaris and ext4 – ZFS would be a plus but I'm not going to hold out hope 🙂
I am using Solaris 10, FreeBSD 10.3, and CentOS 6.7 regularly. The commands here are being run on a CentOS 6.7 VM, but have been cross referenced with FreeBSD.
If possible, I'm interested in gaining an understanding from a POSIX viewpoint, and favouring FreeBSD over Linux if that isn't possible.

Consider the following set of commands:

printf "BIL" > /tmp/BIL

dd of=/tmp/sparse bs=1 count=0 seek=10
dd if=/tmp/BIL of=/tmp/sparse bs=1 count=3 seek=10

dd if=/tmp/BIL of=/tmp/sparse bs=1 count=3 seek=17

dd of=/tmp/sparse bs=1 count=0 seek=30
dd if=/tmp/BIL of=/tmp/sparse bs=1 count=3 seek=30

The file /tmp/BIL should have the contents (in hex) of 4942 004c, so when I hexdump the file /tmp/sparse I should see a smattering of this combination throughout:

%>hexdump sparse
0000000 0000 4942 004c 0000 0000 4942 004c 0000
0000010 4200 4c49 0000 0000 0000 0000 0000 4942
0000020 004c
0000021

%>cat sparse
BILBILBILBIL%

1. Why does the second occurrence of "BIL" appear out of order? i.e. 4200 4c49 rather than 4942 004c? This was written by the third dd command.

2. How does cat and other tools know to print in the correct order?

Using ls we can see the space allegedly used and the blocks allocated:

%>ls -ls /tmp/sparse
8.0K -rw-r--r--. 1 bil bil 33 May 26 14:17 /tmp/sparse

We can see that the alleged size is 33 bytes, but allocated size is 8 kilobytes (file system block size is 4K).

3. How do programs like ls discern between the "alleged" size and the allocated size?

I wondered if the "alleged" figure stored in the inode while the allocated size was calculated by walking the direct and indirect blocks – though this cannot be correct since calculation via walking would take time and tools such as ls return quickly, even for very large files.

4. What tools can I use to interrogate inode information?

I know of stat, but it doesn't seem to print out the values of all of the fields in an inode…

5. Is there a tool where I can walk the direct and indirect blocks?

It would be interesting to see each address on disk, and the contents to gain a bit more understanding of how data is stored

If I run the following command after the others above, the file /tmp/sparse is truncated:

%>dd of=/tmp/sparse bs=1 count=0 seek=5
%>hexdump sparse
0000000 0000 4942 004c
0000005

6. Why does dd truncate my file and can dd or another tool write into the middle of a file?

Lastly, sparse files seem like a Good Idea for preallocating space, but there doesn't appear to be file system or operating system level assurances that the a command won't truncate or arbitrarily grow the file.

7. Are there mechanisms to prevent sparse files be shrunk/grown? And if not, why are sparse files useful?


While each question above could possibly be a separate SO question, I cannot dissect them as they are all related to the underlying understanding.

Best Answer

Some quick answers: first, you didn't create a sparse file. Try these extra commands

dd if=/tmp/BIL of=/tmp/sparse seek=1000
ls -ls /tmp/sparse

You will see the size is 512003 bytes, but only takes 8 blocks. The null bytes have to occupy a whole block, and be on a block boundary for them to be possibly sparse in the filesystem.

  1. Why does the second occurrence of "BIL" appear out of order?

    because you are on a little-endian system and you are writing output in shorts. Use bytes, like cat does.

  2. How does cat and other tools know to print in the correct order?

    they work on bytes.

  3. How do programs like ls discern between the "alleged" size and the allocated size?

    ls and so on use the stat(2) system call which returns 2 values:

    st_size;             /* total size, in bytes */ 
    blkcnt_t  st_blocks; /* number of 512B blocks allocated */
    
  4. What tools can I use to interrogate inode information?

    stat is good.

  5. Is there a tool where I can walk the direct and indirect blocks?

    On ext2/3/4 you can use hdparm --fibmap with the filename:

    $ sudo hdparm --fibmap ~/sparse 
    filesystem blocksize 4096, begins at LBA 25167872; assuming 512 byte sectors.
    byte_offset  begin_LBA    end_LBA    sectors
         512000  226080744  226080751          8
    

    You can also use debugfs:

    $ sudo debugfs /dev/sda3
    debugfs:  stat <1040667>
    Inode: 1040667   Type: regular    Mode:  0644   Flags: 0x0
    Generation: 1161905167    Version: 0x00000000
    User:   127   Group:   500   Size: 335360
    File ACL: 0    Directory ACL: 0
    Links: 1   Blockcount: 664
    Fragment:  Address: 0    Number: 0    Size: 0
    ctime: 0x4dd61e6c -- Fri May 20 09:55:24 2011
    atime: 0x4dd61e29 -- Fri May 20 09:54:17 2011
    mtime: 0x4dd61e6c -- Fri May 20 09:55:24 2011
    Size of extra inode fields: 4
    BLOCKS:
    (0-11):4182714-4182725, (IND):4182726, (12-81):4182727-4182796
    TOTAL: 83
    
  6. Why does dd truncate my file and can dd or another tool write into the middle of a file?

    Yes, dd can write into the middle. Add conv=notrunc.

  7. Are there mechanisms to prevent sparse files be shrunk/grown? And if not, why are sparse files useful?

    No. Because they take less space.

The sparse aspect of a file should be totally transparent to a program, which sometimes means the sparseness may be lost when the program updates a file.

Some copying utilities have options to preserve sparseness, eg tar --sparse, rsync --sparse.

Note, you can explicitly convert the suitably aligned zero blocks in a file to sparseness by using cp --sparse=always and the reverse, converting sparse space into real zeros, with cp --sparse=never.

Related Question