I am a graduate student, and the group in which I work maintains a Linux cluster. Each node of the cluster has its own local disk, but these local disks are relatively small and are not equipped with automatic backup. So the group owns a fileserver with many TBs of storage space. I am a relative Linux novice, so I am not sure what are the specs of the fileserver in terms of speed, networking ability, etc. I do know from experience that the local disks are significantly faster than the fileserver in terms of I/O. About a dozen or so people use the fileserver.
Using cp
to copy a ~20 GB file from the fileserver to one of the local disks takes about 11.5 minutes in real time on average (according to time
). I know that this cp
operation is not very efficient because (1) time
tells me that the system time for such a copy is only ~45 seconds; and because (2) when I examine top
during the copy, %CPU is quite low (by inspection, roughly 0-10% on average).
Using cp
to copy the same ~20 GB file from one folder on the local disk to another folder on the same local disk takes less time — about 9 minutes in real time (~51 seconds in system time, according to time
). So apparently the fileserver is somewhat slower than the local disk, as expected, but perhaps not significantly slower. I am surprised that copying from local to same local is not faster than 9 minutes.
I need to copy ~200 large files — each ~20 GB — from the fileserver to one of the local disks. So, my question is: Is there a faster alternative to cp
for copying large files in Linux? (Or are there any flags within cp
that I could use which would speed up copying?) Even if I could somehow shave a minute off this copying time, that would help immensely.
I am sure that buying new, faster hardware disks, but I don't have access to such resources. I am also not a system administrator — I am only a (novice) user — so I don't have access to more detailed information on the load that is on the disks. I do know that while about a dozen people use the fileserver daily, I am the only person using this particular node/local disk.
Best Answer
%CPU should be low during a copy. The CPU tells the disk controller "grab data from sectors X–Y into memory buffer at Z". Then it goes and does something else (or sleep, if there is nothing else). The hardware triggers an interrupt when the data is in memory. Then the CPU has to copy it a few times, and tells the network card "transmit packets at memory locations A, B, and C". Then it goes back to doing something else.
You're pushing ~240mbps. On a gigabit LAN, you ought to be able to do at least 800mbps, but:
For tracking down the bottleneck,
iostat -kx 10
is going to be a useful command. It'll show you the utilization on your local hard disks. If you can run that on the file server, it'll tell you how busy the file server is.The general solution is going to be to speed up that bottleneck, which of course you don't have the budget for. But, there are a couple of special cases where you can find a faster approach:
lzop
or maybegzip --fastest
.rsync
won't really help here, as it will need to read the file on both sides to find the delta. Instead, you need something that keeps track of the delta as you change the file... Most approaches here are app-specific. But its possible that you could rig something up with, e.g., device-mapper (see the brand new dm-era target) or btrfs.And, since you note you're not the sysadmin, I'm guessing that means you have a sysadmin. Or at least someone responsible for the file server & network. You should probably ask him/her/them, they should be much more familiar with the specifics of your setup. Your sysadmin(s) should at least be able to tell you what transfer rate you can reasonably expect.