Copying faster than cp

cpfile-copyrecursiversyncsolaris

I am currently copying a large number of directories and files recursively on the same disk using cp -r.

Is there a way to do this more quickly? Would compressing the files first be better, or maybe using rsync?

Best Answer

I was recently puzzled by the sometimes slow speed of cp. Specifically, how come df = pandas.read_hdf('file1', 'df') (700ms for a 1.2GB file) followed by df.to_hdf('file2') (530ms) could be so much faster than cp file1 file2 (8s)?

Digging into this:

  • cat file1 > file2 isn't any better (8.1s).
  • dd bs=1500000000 if=file1 of=file2 neither (8.3s).
  • rsync file1 file2 is worse (11.4s), because file2 existed already so it tries to do its rolling checksum and block update magic.

Oh, wait a second! How about unlinking (deleting) file2 first if it exists?

Now we are talking:

  • rm -f file2: 0.2s (to add to any figure below).
  • cp file1 file2: 1.0s.
  • cat file1 > file2: 1.0s.
  • dd bs=1500000000 if=file1 of=file2: 1.2s.
  • rsync file1 file2: 4s.

So there you have it. Make sure the target files don't exist (or truncate them, which is presumably what pandas.to_hdf() does).

Edit: this was without emptying the cache before any of the commands, but as noted in the comments, doing so just consistently adds ~3.8s to all numbers above.

Also noteworthy: this was tried on various Linux versions (Centos w. 2.6.18-408.el5 kernel, and Ubuntu w. 3.13.0-77-generic kernel), and ext4 as well as ext3. Interestingly, on a MacBook with Darwin 10.12.6, there is no difference and both versions (with or without existing file at the destination) are fast.

Related Question