Fast way to copy a large file on a LAN

file-copynetworkingnfstcp

I am having some trouble with NFS, and I'd like to try using just plain old TCP.

I have no idea where to begin, though.

Hardware-wise, I am using an ethernet crossover cable to network two netbooks.

To network them, I type

$ sudo ifconfig eth0 192.168.1.1 up && ping -c 10 -s 10 192.168.1.2 && sudo /etc/init.d/nfs-kernel-server start

on the first netbook and

$ sudo ifconfig eth0 192.168.1.2 up
$ ping -c 10 -s 10 192.168.1.1
$ mount /mnt/network1

on the second

where /mnt/network1 is specified in /etc/fstab as

192.168.1.1:/home /mnt/network1 nfs noauto,user,exec,soft,nfsvers=2 0 0

as well as in /etc/exports (using the syntax of that file), on the first netbook.

The above works fine, but the files and directories are huge. The files average about half a gigabyte a piece, and the directories are all between 15 and 50 gigabytes.

I'm using rsync to transfer them, and the command (on 192.168.1.2) is

$ rsync -avxS /mnt/network1 ~/somedir

I'm not sure if there's a way to tweak my NFS settings to handle huge files better, but I'd like to see if running an rsync daemon over plain old TCP works better than rsync over NFS.

So, to reiterate, how do I set up a similar network with TCP?

UPDATE:

So, after a good at few hours of attempting to pull myself out of the morass of my own ignorance (or, as I like to think of it, to pull myself up by my own bootstraps) I came up with some useful facts.

But first of all, what led me on this rabbit trail instead of simply accepting the current best answer was this: nc is an unbelievably cool program that resolutely fails to work for me. I've tried the netcat-openbsd and netcat-traditional packages with no luck whatsoever.

The error I get on the receiving machine (192.168.1.2) is:

me@netbook:~$ nc -q 1 -l -p 32934 | tar xv
Can't grab 0.0.0.0:32934 with bind
tar: This does not look like a tar archive
tar: Exiting with failure status due to previous errors

route gives:

me@netbook:~$ route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         dir-615         0.0.0.0         UG    0      0        0 wlan0
link-local      *               255.255.0.0     U     1000   0        0 eth0
192.168.0.0     *               255.255.255.0   U     2      0        0 wlan0
192.168.1.0     *               255.255.255.0   U     0      0        0 eth0

But, here's the good news: having the static IP addresses set in /etc/network/interfaces, which I started doing while trying to get nc working, fixed all my NFS problems and rekindled my love for NFS.

The exact configuration I used (with 192.168.1.1 for the first netbook, of course) was:

auto eth0
iface eth0 inet static
address 192.168.1.2
netmask 255.255.255.0

With those settings, the two netbooks will be able to ping each other directly after being booted up, without even an ifup.

Anyway, I'd still really like to see nc in action, so I'm hoping someone help me debug this process.

Best Answer

The quick way

The quickest way to transfer files over a LAN is likely not rsync, unless there are few changes. rsync spends a fair bit of time doing checksums, calculating differences, etc. If you know that you're going to be transferring most of the data anyway, just do something like this (note: there are multiple implementations of netcat; check the manual for the correct options. In particular, yours might not want the -p):

user@dest:/target$ nc -q 1 -l -p 1234 | tar xv

user@source:/source$ tar cv . | nc -q 1 dest-ip 1234

That uses netcat (nc) to send tar over a raw TCP connection on port 1234. There is no encryption, authenticity checking, etc, so its very fast. If your cross-connect is running at gigabit or less, you'll peg the network; if its more, you'll peg the disk (unless you have a storage array, or fast disk). The v flags to tar make it print file names as it goes (verbose mode). With large files, that's practically no overhead. If you were doing tons of small files, you'd turn that off. Also, you can insert something like pv into the pipeline to get a progress indicator:

user@dest:/target$ nc -q 1 -l -p 1234 | pv -pterb -s 100G | tar xv

You can of course insert other things too, like gzip -1 (and add the z flag on the receiving end—the z flag on the sending end would use a higher compression level than 1, unless you set the GZIP environment variable, of course). Though gzip will probably actually be slower, unless your data really compresses.

If you really need rsync

If you're really only transferring a small portion of the data that has changed, rsync may be faster. You may also want to look at the -W/--whole-file option, as with a really fast network (like a cross-connect) that can be faster.

The easiest way to run rsync is over ssh. You'll want to experiment with ssh ciphers to see which is fastest, it'll be either AES, ChaCha20, or Blowfish (though there are some security concerns with Blowfish's 64-bit block size), depending on if your chip has Intel's AES-NI instructions (and your OpenSSL uses them). On a new enough ssh, rsync-over-ssh looks like this:

user@source:~$ rsync -e 'ssh -c aes128-gcm@openssh.com' -avP /source/ user@dest-ip:/target

For older ssh/sshd, try aes128-ctr or aes128-cbc in place of aes128-gcm@openssh.com.

ChaCha20 would be chacha20-poly1305@openssh.com (also needs a new enough ssh/sshd) and Blowfish would be blowfish-cbc. OpenSSH does not allow running without a cipher. You can of course use whichever rsync options you like in place of -avP. And of course you can go the other direction, and run the rsync from the destination machine (pull) instead of the source machine (push).

Making rsync faster

If you run an rsync daemon, you can get rid of the crypto overhead. First, you'd create a daemon configuration file (/etc/rsyncd.conf), for example on the source machine (read the rsyncd.conf manpage for details):

[big-archive]
    path = /source
    read only = yes
    uid = someuser
    gid = somegroup

Then, on the destination machine, you'd run:

user@dest:~$ rsync -avP source-ip::big-archive/ /target

You can do this the other way around too (but of course you'll need to set read only to no). There are options for authentication, etc., check the manpage for details.

Related Question