Why is fragmentation level so huge in files that contain other filesystems

ddext4filesystemssparse-files

I've just found out what sparse files are and wanted to conduct some experiments on them. On wiki you can read that the files can get fragmented easily. I wanted to check how bad that is. I created a file in the following way:

# truncate -s 10G sparse-file
# mkfs.ext4 -m 0 -L sparse ./sparse-file

I mounted the sparse file, and I put a 600M file in it. The fragmentation level looks like this:

# filefrag -v "/media/Grafi/sparse-file"
Filesystem type is: ef53
File size of /media/Grafi/sparse-file is 10737418240 (2621440 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..    1032:      36864..     37896:   1033:
   1:     1043..    1043:      37907..     37907:      1:
   2:     1059..    1059:      37923..     37923:      1:
   3:     9251..    9256:      46115..     46120:      6:
   4:    32768..   32770:      51200..     51202:      3:      69632:
   5:    34816..   55295:      77824..     98303:  20480:      53248:
   6:    55296..   57343:     114688..    116735:   2048:      98304:
   7:    57344..   69631:     120832..    133119:  12288:     116736:
   8:    69632..   81919:     102400..    114687:  12288:     133120:
   9:    81920..   98303:     135168..    151551:  16384:     114688:
  10:    98304..   98306:      57344..     57346:      3:     151552:
  11:   100352..  112639:     151552..    163839:  12288:      59392:
  12:   112640..  145407:     165888..    198655:  32768:     163840:
  13:   145408..  163839:     198656..    217087:  18432:
  14:   163840..  163842:      40960..     40962:      3:     217088:
  15:   165888..  178175:     217088..    229375:  12288:      43008:
  16:   178176..  202751:     231424..    255999:  24576:     229376:
  17:   202752..  206847:     258048..    262143:   4096:     256000:
  18:   206848..  216756:     276480..    286388:   9909:     262144:
  19:   229376..  229378:      43008..     43010:      3:     299008:
  20:   294912..  294914:      53248..     53250:      3:     108544:
  21:   524288..  524288:      55296..     55296:      1:     282624:
  22:   819200..  819202:      61440..     61442:      3:     350208:
  23:   884736..  884738:      63488..     63490:      3:     126976:
  24:  1048576.. 1048577:      67584..     67585:      2:     227328:
  25:  1081344.. 1081391:      69632..     69679:     48:     100352:
  26:  1572864.. 1572864:      71680..     71680:      1:     561152:
  27:  1605632.. 1605634:      73728..     73730:      3:     104448:
  28:  2097152.. 2097152:      75776..     75776:      1:     565248:
  29:  2097167.. 2097167:      75791..     75791:      1:             last
/media/Grafi/sparse-file: 25 extents found

I thought it was because of the "sparse" feature, but it looks like all files that have other filesystems in them get fragmented in this way. Take a look at the following example:

Create a file full of zeroes:

# dd if=/dev/zero of=./zero bs=1M count=2048 

Check its fragmentation level:

# filefrag -v /media/Grafi/zero
Filesystem type is: ef53
File size of /media/Grafi/zero is 2147483648 (524288 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..   32767:    6172672..   6205439:  32768:
   1:    32768..   65535:    6205440..   6238207:  32768:
   2:    65536..   98303:    6238208..   6270975:  32768:
   3:    98304..  118783:    6270976..   6291455:  20480:
   4:   118784..  151551:    6324224..   6356991:  32768:    6291456:
   5:   151552..  184319:    6356992..   6389759:  32768:
   6:   184320..  217087:    6389760..   6422527:  32768:
   7:   217088..  249855:    6422528..   6455295:  32768:
   8:   249856..  282623:    6455296..   6488063:  32768:
   9:   282624..  315391:    6488064..   6520831:  32768:
  10:   315392..  348159:    6520832..   6553599:  32768:
  11:   348160..  380927:    6553600..   6586367:  32768:
  12:   380928..  413695:    6586368..   6619135:  32768:
  13:   413696..  446463:    6619136..   6651903:  32768:
  14:   446464..  479231:    6651904..   6684671:  32768:
  15:   479232..  511999:    6684672..   6717439:  32768:
  16:   512000..  524287:    6717440..   6729727:  12288:             last,eof
/media/Grafi/zero: 2 extents found

So basically, this file has 17 extents, but from the human perspective, the file has two chunks

Now create a filesystem in this file:

# mkfs.ext4 -m 0 -L ext /media/Grafi/zero

Check its fragmentation again:

# filefrag -v /media/Grafi/zero

Filesystem type is: ef53
File size of /media/Grafi/zero is 2147483648 (524288 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..     257:    5505024..   5505281:    258:
   1:      265..     265:    5505289..   5505289:      1:
   2:      272..     273:    5505296..   5505297:      2:
   3:      289..     289:    5505313..   5505313:      1:
   4:     8481..    8486:    5507361..   5507366:      6:    5513505:
   5:    32768..   32769:    5509120..   5509121:      2:    5531648:
   6:    98304..   98305:    5511168..   5511169:      2:    5574656:
   7:   163840..  163841:    5513216..   5513217:      2:    5576704:
   8:   229376..  229377:    5515264..   5515265:      2:    5578752:
   9:   262144..  262144:    5517312..   5517312:      1:    5548032:
  10:   294912..  294913:    5519360..   5519361:      2:    5550080: last
/media/Grafi/zero: 8 extents found

Does anyone know what actually happened here? Why the file got fragmented by creating a filesystem on it? What happened to the length?

Added:

The mkfs.ext4 parameter -Enodiscard doesn't work. With this option I can see the structure of the file in filefrag (the zeroed blocks). But after creating the filesystem in this way, the file becomes fragmented for some reason no matter what. Maybe it's because of the filesystem metadata that is written, and it does something to the zeroed file. I don't know. But when I watch the output of filefrag, I can see that there is always +6 extents (in the case of 2G file). Maybe it's because of the superblock and its 5 copies? But this still doesn't explain why the whole file is fragmented — it's still the same file.

There's another thing. When I recreate the filesystem in this file:

# mkfs.ext4 -Enodiscard /media/Grafi/zero
mke2fs 1.43 (17-May-2016)
/media/Grafi/zero contains a ext4 file system
        created on Thu Jun  2 13:02:28 2016
Proceed anyway? (y,n) y
Creating filesystem with 524288 4k blocks and 131072 inodes
Filesystem UUID: 6d58dddc-439b-4175-9af6-8628f0d2a278
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912

Allocating group tables: done
Writing inode tables: done
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done

The added extents magically disappear.

Best Answer

It looks like this was a bug in mke2fs that caused it to use fallocate(fd, PUNCH_HOLE, ...) instead of fallocate(fd, DISCARD_ZERO, ...) when zeroing out the space in the inode tables (even when -E nodiscard was used).

I submitted a bug report to the upstream linux-ext4@vger.kernel.org mailing list after verifying this behaviour locally, and got a patch within an hour, subject:

e2fprogs: block zero/discard cleanups

They should be included into the e2fsprogs-1.45 release, and likely the 1.44.x maintenance release. If you want them in a vendor e2fsprogs release, I'd recommend to patch+build your e2fsprogs to verify this is working for you, report success to linux-ext4 so that the patches will land sooner, then submit a bug report to your distro of choice so they pull the upstream patches into their releases.

Related Question