Rsync hangs unless I defrag ext4 filesystem

I have a XBian server (Raspberry Pi version of Debian) running rsync via inetd (not the native dæmon). I am serving a couple of directories on an ext4 filesystem (on a USB disk) as individual modules (the modules in question have in the order of 100-500 GB of data, and 1000-10000 files). I have noticed that lately, as I alter other parts of the filesystem (i.e. uploads, copies, etc. not necessarily in those aforementioned directories), rsync calls to these modules will time out.

For a routine rsync command like rsync -vrt rsync://host:port/module ./, where I would not expect to need any file transfer (i.e. both server and client locations would have the same data), in the rsync server log file, I see logs like these:

2014/12/15 22:59:59 [###] connect from UNKNOWN (1.1.1.1)
2014/12/15 22:59:59 [###] rsync on share/ from UNKNOWN (1.1.1.1)
2014/12/15 22:59:59 [###] building file list
2014/12/15 23:16:23 [###] rsync: read error: Connection timed out (110)
2014/12/15 23:16:23 [###] rsync error: error in socket IO (code 10) at io.c(785) [sender=3.1.1]

In the client logs, I see logs like these (yes, same transfer – the server reported timeout after 15 minutes while the client reported error after 30 minutes):

2014/12/15 23:00:01 [###] receiving file list
2014/12/15 23:29:26 [###] rsync: read error: Connection reset by peer (104)
2014/12/15 23:29:26 [###] rsync error: error in rsync protocol data stream (code 12) at /usr/src/ports/rsync/rsync-3.0.9-1/src/rsync-3.0.9/io.c(764) [Receiver=3.0.9]

Any number of issues could cause a situation like this, but after defragmenting a couple of files for other issues I noticed, I also noticed that my rsync transfers would begin to complete successfully again. Then, after I uploaded some more files (again, to a directory outside of the rsync module), I would see the timeouts return. Now, whenever I see my logs having timeout errors, I defragment (with e4defrag) my system and then can successfully run the rsync transfer again.

A few additional notes:

My ext4 partition uses less than 50% of its available space at the moment
My rsync calls to other, smaller modules do not time out
Even calls without data transfer (e.g. rsync -rt rsync://host:post/module) time out in this state
After further testing, it seems that after defragmentation, I can run the rsync call successfully once before I need to defragment again (does an rsync call actually cause file fragmentation?)

Why does my rsync setup require a defragmentation each time and what can I do to ensure my rsync doesn't break on such a minor inconvenience any more?

Best Answer

Try a tar to /dev/null of the directory instead of a defrag... that will definitely not modify the disk, but will get all of the inodes cached. With large directories containing lots of files, ext4 indexes them in a hash tree so readdir() returns them in essentially random order. Trying to stat() them in that same order causes a lot of seeks, making it very, very slow.

Best Answer

Related Solutions

@ERROR: chdir failed but directory exists

Debian – Rsync hangs during file transfer to USB disk

Related Question