I am trying to copy files over SSH, but cannot use scp
due to not knowing the exact filename that I need. Although small binary files and text files transfer fine, large binary files get altered. Here is the file on the server:
remote$ ls -la
-rw-rw-r-- 1 user user 244970907 Aug 24 11:11 foo.gz
remote$ md5sum foo.gz
9b5a44dad9d129bab52cbc6d806e7fda foo.gz
Here is the file after I've moved it over:
local$ time ssh me@server.com -t 'cat /path/to/foo.gz' > latest.gz
real 1m52.098s
user 0m2.608s
sys 0m4.370s
local$ md5sum latest.gz
76fae9d6a4711bad1560092b539d034b latest.gz
local$ ls -la
-rw-rw-r-- 1 dotancohen dotancohen 245849912 Aug 24 18:26 latest.gz
Note that the downloaded file is bigger than the one on the server! However, if I do the same with a very small file, then everything works as expected:
remote$ echo "Hello" | gzip -c > hello.txt.gz
remote$ md5sum hello.txt.gz
08bf5080733d46a47d339520176b9211 hello.txt.gz
local$ time ssh me@server.com -t 'cat /path/to/hello.txt.gz' > hi.txt.gz
real 0m3.041s
user 0m0.013s
sys 0m0.005s
local$ md5sum hi.txt.gz
08bf5080733d46a47d339520176b9211 hi.txt.gz
Both file sizes are 26 bytes in this case.
Why might small files transfer fine, but large files get some bytes added to them?
Best Answer
TL;DR
Don't use
-t
.-t
involves a pseudo-terminal on the remote host and should only be used to run visual applications from a terminal.Explanation
The linefeed character (also known as newline or
\n
) is the one that when sent to a terminal tells the terminal to move its cursor down.Yet, when you run
seq 3
in a terminal, that is whereseq
writes1\n2\n3\n
to something like/dev/pts/0
, you don't see:but
Why is that?
Actually, when
seq 3
(orssh host seq 3
for that matter) writes1\n2\n3\n
, the terminal sees1\r\n2\r\n3\r\n
. That is, the line-feeds have been translated to carriage-return (upon which terminals move their cursor back to the left of the screen) and line-feed.That is done by the terminal device driver. More exactly, by the line-discipline of the terminal (or pseudo-terminal) device, a software module that resides in the kernel.
You can control the behaviour of that line discipline with the
stty
command. The translation ofLF
->CRLF
is turned on with(which is generally enabled by default). You can turn it off with:
Or you can turn all output processing off with:
If you do that and run
seq 3
, you'll then see:as expected.
Now, when you do:
seq
is no longer writing to a terminal device, it's writing into a regular file, there's no translation being done. Sosome-file
does contain1\n2\n3\n
. The translation is only done when writing to a terminal device. And it's only done for display.similarly, when you do:
ssh
is writing1\n2\n3\n
regardless of whatssh
's output goes to.What actually happens is that the
seq 3
command is run onhost
with its stdout redirected to a pipe. Thessh
server on host reads the other end of the pipe and sends it over the encrypted channel to yourssh
client and thessh
client writes it onto its stdout, in your case a pseudo-terminal device, whereLF
s are translated toCRLF
for display.Many interactive applications behave differently when their stdout is not a terminal. For instance, if you run:
vi
doesn't like it, it doesn't like its output going to a pipe. It thinks it's not talking to a device that is able to understand cursor positioning escape sequences for instance.So
ssh
has the-t
option for that. With that option, the ssh server on host creates a pseudo-terminal device and makes that the stdout (and stdin, and stderr) ofvi
. Whatvi
writes on that terminal device goes through that remote pseudo-terminal line discipline and is read by thessh
server and sent over the encrypted channel to thessh
client. It's the same as before except that instead of using a pipe, thessh
server uses a pseudo-terminal.The other difference is that on the client side, the
ssh
client sets the terminal inraw
mode. That means that no translation is done there (opost
is disabled and also other input-side behaviours). For instance, when you type Ctrl-C, instead of interruptingssh
, that^C
character is sent to the remote side, where the line discipline of the remote pseudo-terminal sends the interrupt to the remote command.When you do:
seq 3
writes1\n2\n3\n
to its stdout, which is a pseudo-terminal device. Because ofonlcr
, that gets translated on host to1\r\n2\r\n3\r\n
and sent to you over the encrypted channel. On your side there is no translation (onlcr
disabled), so1\r\n2\r\n3\r\n
is displayed untouched (because of theraw
mode) and correctly on the screen of your terminal emulator.Now, if you do:
There's no difference from above.
ssh
will write the same thing:1\r\n2\r\n3\r\n
, but this time intosome-file
.So basically all the
LF
in the output ofseq
have been translated toCRLF
intosome-file
.It's the same if you do:
All the
LF
characters (0x0a bytes) are being translated into CRLF (0x0d 0x0a).That's probably the reason for the corruption in your file. In the case of the second smaller file, it just so happens that the file doesn't contain 0x0a bytes, so there is no corruption.
Note that you could get different types of corruption with different tty settings. Another potential type of corruption associated with
-t
is if your startup files onhost
(~/.bashrc
,~/.ssh/rc
...) write things to their stderr, because with-t
the stdout and stderr of the remote shell end up being merged intossh
's stdout (they both go to the pseudo-terminal device).You don't want the remote
cat
to output to a terminal device there.You want:
You could do:
That would work (except in the writing to stderr corruption case discussed above), but even that would be sub-optimal as you'd have that unnecessary pseudo-terminal layer running on
host
.Some more fun:
OK.
LF
translated toCRLF
OK again.
That's another form of output post-processing that can be done by the terminal line discipline.
ssh
refuses to tell the server to use a pseudo-terminal when its own input is not a terminal. You can force it with-tt
though:The line discipline does a lot more on the input side.
Here,
echo
doesn't read its input nor was asked to output thatx\r\n\n
so where does that come from? That's the localecho
of the remote pseudo-terminal (stty echo
). Thessh
server is feeding thex\n
it read from the client to the master side of the remote pseudo-terminal. And the line discipline of that echoes it back (beforestty opost
is run which is why we see aCRLF
and notLF
). That's independent from whether the remote application reads anything from stdin or not.The
0x3
character is echoed back as^C
(^
andC
) because ofstty echoctl
and the shell and sleep receive a SIGINT becausestty isig
.So while:
is bad enough, but
to transfer files the other way across is a lot worse. You'll get some CR -> LF translation, but also problems with all the special characters (
^C
,^Z
,^D
,^?
,^S
...) and also the remotecat
will not see eof when the end oflocal-file
is reached, only when^D
is sent after a\r
,\n
or another^D
like when doingcat > file
in your terminal.