To answer the main question in short, rsync
seems to write double the number of bytes, because it spawns two processes/threads to do the copy, and there's one stream data between the processes, and another from the receiving process to the target file.
We can tell this by looking at the strace
output in more detail, the process IDs in the beginning of the file, and also the file descriptor numbers in the write
calls can be used to tell different write "streams" from each other.
Presumably, this is so that a local transfer can work just like a remote transfer, only the source and destination are on the same system.
Using something like strace -e trace=process,socketpair,open,read,write
would show some threads spawned off, the socket pair being created between them, and different threads opening the input and output files.
A test run similar to yours:
$ rm test2
$ strace -f -e trace=process,socketpair,open,close,dup,dup2,read,write -o rsync.log rsync -avcz --progress test1 test2
$ ls -l test1 test2
-rw-r--r-- 1 itvirta itvirta 81920004 Jun 21 20:20 test1
-rw-r--r-- 1 itvirta itvirta 81920004 Jun 21 20:20 test2
Let's take a count of bytes written for each thread separately:
$ for x in 15007 15008 15009 ; do echo -en "$x: " ; grep -E "$x (<... )?write" rsync.log | awk 'BEGIN {FS=" = "} {sum += $2} END {print sum}' ; done
15007: 81967265
15008: 49
15009: 81920056
Which matches pretty much with the theory above. I didn't check what the other 40kB written by the first thread is, but I'll assume it prints the progress output, and whatever metadata about the synced file rsync needs to transfer to the other end.
I didn't check, but I'll suggest that even with delta compression enabled, perhaps the "remote" end of rsync still writes out (most of) the file in full, resulting in approximately the same amount of writes as with cp. The transfer between the rsync threads is smaller, but the final output is still the same.
The sshfs
FUSE filesystem is implemented by presenting a filesystem on top of sftp
, the file transfer protocol. As a result, any file access such as editing with vi[m]
requires the sshfs
subsystem first to copy the file to a cache on the local filesystem. If the file is particularly large, or the network between your client and the server is particularly slow, it will take a measurable amount of time to transfer the file before it's accessible locally.
It's (very) broadly equivalent to the following (except it uses sftp
instead of scp
)
# Copy the remote file to a temporary local cache
scp -p remote:/path/to/file /tmp/file.tmp
checksum=$(cksum /tmp/file.tmp)
# Action on remote file is implemented by performing the action locally
vi /tmp/file.tmp
# Simplified; we would also need to handle local rm/mv -> remote rm/mv, etc.
[[ "$(cksum /tmp/file.tmp)" != "$checksum" ]] && scp -p /tmp/file.tmp remote:/path/to/file
As a consequence, you'll find that trying to run gcc
locally will be measurably slower than just logging in to the remote server and running it there. To be honest I'm not overly surprised that "gcc crashes when trying to compile files on the remote fs". It shouldn't, of course, but then think about what's actually going on in the background...
Best Answer
The original problem (based on reading all comments to the OP question) was that the
scp
executable on the 64-bit system was a 32-bit application. A 32-bit application that isn't compiled with "large-file support" ends up with seek pointers that are limited to2^32 =~ 4GB
.You may tell if
scp
is 32-bit by using thefile
command:On most modern systems it will be 64-bit, so no file truncation would occur:
A 32-application should still be able to support "large files" but it has to be compiled from source with large-file support which this case apparently wasn't.
The recommended solution is perhaps to use a full standard 64-bit distribution where apps are compiled as 64-bit by default.