The quick way
The quickest way to transfer files over a LAN is likely not rsync, unless there are few changes. rsync spends a fair bit of time doing checksums, calculating differences, etc. If you know that you're going to be transferring most of the data anyway, just do something like this (note: there are multiple implementations of netcat
; check the manual for the correct options. In particular, yours might not want the -p
):
user@dest:/target$ nc -q 1 -l -p 1234 | tar xv
user@source:/source$ tar cv . | nc -q 1 dest-ip 1234
That uses netcat (nc
) to send tar over a raw TCP connection on port 1234. There is no encryption, authenticity checking, etc, so its very fast. If your cross-connect is running at gigabit or less, you'll peg the network; if its more, you'll peg the disk (unless you have a storage array, or fast disk). The v
flags to tar make it print file names as it goes (verbose mode). With large files, that's practically no overhead. If you were doing tons of small files, you'd turn that off. Also, you can insert something like pv
into the pipeline to get a progress indicator:
user@dest:/target$ nc -q 1 -l -p 1234 | pv -pterb -s 100G | tar xv
You can of course insert other things too, like gzip -1
(and add the z
flag on the receiving end—the z
flag on the sending end would use a higher compression level than 1, unless you set the GZIP environment variable, of course). Though gzip will probably actually be slower, unless your data really compresses.
If you really need rsync
If you're really only transferring a small portion of the data that has changed, rsync may be faster. You may also want to look at the -W
/--whole-file
option, as with a really fast network (like a cross-connect) that can be faster.
The easiest way to run rsync is over ssh. You'll want to experiment with ssh ciphers to see which is fastest, it'll be either AES, ChaCha20, or Blowfish (though there are some security concerns with Blowfish's 64-bit block size), depending on if your chip has Intel's AES-NI instructions (and your OpenSSL uses them). On a new enough ssh, rsync-over-ssh looks like this:
user@source:~$ rsync -e 'ssh -c aes128-gcm@openssh.com' -avP /source/ user@dest-ip:/target
For older ssh/sshd, try aes128-ctr
or aes128-cbc
in place of aes128-gcm@openssh.com
.
ChaCha20 would be chacha20-poly1305@openssh.com
(also needs a new enough ssh/sshd) and Blowfish would be blowfish-cbc. OpenSSH does not allow running without a cipher. You can of course use whichever rsync options you like in place of -avP
. And of course you can go the other direction, and run the rsync from the destination machine (pull) instead of the source machine (push).
Making rsync faster
If you run an rsync daemon, you can get rid of the crypto overhead. First, you'd create a daemon configuration file (/etc/rsyncd.conf
), for example on the source machine (read the rsyncd.conf manpage for details):
[big-archive]
path = /source
read only = yes
uid = someuser
gid = somegroup
Then, on the destination machine, you'd run:
user@dest:~$ rsync -avP source-ip::big-archive/ /target
You can do this the other way around too (but of course you'll need to set read only to no). There are options for authentication, etc., check the manpage for details.
With Linux 2.6.24+ (considered experimental until 2.6.29), you can use network namespaces for that. You need to have the 'network namespaces' enabled in your kernel (CONFIG_NET_NS=y
) and util-linux with the unshare
tool.
Then, starting a process without network access is as simple as:
unshare -n program ...
This creates an empty network namespace for the process. That is, it is run with no network interfaces, including no loopback. In below example we add -r to run the program only after the current effective user and group IDs have been mapped to the superuser ones (avoid sudo):
$ unshare -r -n ping 127.0.0.1
connect: Network is unreachable
If your app needs a network interface you can set a new one up:
$ unshare -n -- sh -c 'ip link set dev lo up; ping 127.0.0.1'
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=32 time=0.066 ms
Note that this will create a new, local loopback. That is, the spawned process won't be able to access open ports of the host's 127.0.0.1
.
If you need to gain access to the original networking inside the namespace, you can use nsenter
to enter the other namespace.
The following example runs ping
with network namespace that is used by PID 1 (it is specified through -t 1
):
$ nsenter -n -t 1 -- ping -c4 example.com
PING example.com (93.184.216.119) 56(84) bytes of data.
64 bytes from 93.184.216.119: icmp_seq=1 ttl=50 time=134 ms
64 bytes from 93.184.216.119: icmp_seq=2 ttl=50 time=134 ms
64 bytes from 93.184.216.119: icmp_seq=3 ttl=50 time=134 ms
64 bytes from 93.184.216.119: icmp_seq=4 ttl=50 time=139 ms
--- example.com ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3004ms
rtt min/avg/max/mdev = 134.621/136.028/139.848/2.252 ms
Best Answer
Under Linux, try to use a network namespace, e.g:
This should prevent the program from accessing the network.