Ubuntu – How to speed up rsync/tar of large Maildir

disk-managementfilesystemperformancersync

I have a very large Maildir I am copying to a new machine (over 100BASE-T) with rsync. The progress is slow. VERY SLOW. Like 1 MB/s slow. I think this is because it is a lot of small files that are being read in an order that essentially is random with respect to where the blocks are stored on disk, causing a massive seek storm. I get similar results when trying to tar the directory. Is there a way to get rsync/tar to read in disk block order, or otherwise overcome this problem?

Edit: I tried tar cf /dev/zero Maildir/ and on the old system, this took 30 minutes! On the new system when the rsync finally finished, the same test took 18 minutes. Dumping the same directory on the old system took 8 minutes, and on the new system, dump -0f /dev/zero -b 1024 /home/psusi/Maildir/ finished in only 30 seconds.

Best Answer

I ended up writing a little python script to calculate the correlation between directory names and inodes, inodes and data blocks, and directory names to data blocks. It turns out that ext4 tends to have rather poor correlation between the order the file names appear in the directory, and where they are stored on disk. After discussing it on the ext4 mailing list, it turns out that this is the result of the hashed directory indexes used to speed up lookups in large directories. The names are stored in hash order, which effectively scrambles their order relative to anything else.

It seems to me and at least one other commenter that this is a deficiency in the fs that should be fixed. Ted Ts'o ( the ext maintainer ) feels that it would be too difficult to do in the fs, and that good tools ( like rsync and tar ) should have an option to sort the directory by inode number before reading the files.

So it looks like feature enhancement requests need filed for rsync and tar.

Related Question