Linux – Make Linux write to network filesystem concurrently with local disk reads

cachecifsiolinux

Summary

How can you configure Linux to both read from a local disk/filesystem and write to a network share at the same time, as opposed to reading while no data is going over the network, then sending that data over the network while the local disk is idle?

It is much faster to read and write at the same time instead of only performing one operation and then the other in an alternating fashion.

Details

I am moving a large amount of data from local disks on a Linux machine to a NAS device.

I am using rsync to basically copy /srv/data into /mnt/nas, which is a CIFS mount.

It started off well, reading at 100MB/sec and writing to the NAS at 100MB/sec (limit of gigabit network), with both reading and writing happening simultaneously.

However now, a few hours later, I am finding that it is reading from the local disk, then stopping the read while it writes to the NAS, then when there is no more data to write to the NAS, it resumes reading from the disk again. The network is idle while the disk is being read, and the disk is idle while the network is in use.

Needless to say, reading 200MB then writing 200MB takes much longer than reading and writing that 200MB at the same time.

How can I configure the kernel such that it sticks to the earlier behaviour of reading and writing at the same time, rather than alternating between reading then writing, performing only one operation at a time?

Some observations: When the local disk reads at 100+MB/sec everything seems to happen in parallel just fine, but once the disk slows down (seems to be going at only 20MB/sec now for some reason) that's when this read/write switching seems to happen.

I can also run sync manually every few seconds to get the writes happening in parallel with the reads (though obviously at the reduced speeds) however putting sync in a while loop so that it runs every five seconds doesn't seem like the right solution…

The kernel seems to cache about 1GB of data and then write it out over the network as fast as possible – which is fine – I just don't understand why the slow disk needs to stop being read while the data is being sent out over the network.

Best Answer

After some more investigation, it looks like this issue is less kernel related and more about how rsync and CIFS interact.

As far as I can make out, what is happening is that when rsync closes the destination file, CIFS (and probably any network filesystem) ensures the file is completely flushed and written to the remote disk before the close syscall returns. This is to assure any application that once the close operation completes successfully, the file has been completely saved and there is no risk of any further errors that could cause data loss.

If this wasn't done, then it would be possible for an application to close a file, exit thinking the save operation was successful, then later (perhaps due to a network problem) the data could not be written after all, but by then it is too late for the application to do anything about it, such as asking the user if they want to save the file somewhere else instead.

This requirement means that every time rsync finishes copying a file, the entire disk buffer must empty out over the network before rsync is allowed to continue reading the next file.

A workaround is to mount the CIFS share with the option cache=none which disables this feature, and causes all I/O to go direct to the server. This eliminates the problem and allows reads and writes to execute in parallel, however a drawback of this solution is that the performance is somewhat lower. In my case, network transfer speed drops from 110MB/sec to 80MB/sec.

This may mean that if you are copying large files, performance may well be better with the alternating read/write behaviour. With many smaller files, disabling the cache will result in fewer cache flushes each time a file is closed so performance may increase there.

It seems rsync needs an option to close its file handles in another thread, so it can start reading the next file while the last one is still being flushed.

EDIT: I have confirmed that cache=none definitely helps when transferring lots of small files (brings it from 10MB/sec up to 80MB/sec) but when transferring large files (1GB+) cache=none drops the transfer from 110MB/sec down to the same 80MB/sec. This suggests that the slow transfer from many small files is less about the source disk seeking, and more about having so many cache flushes from all the small files.

Related Question