SSH – How Remote Command Line Arguments Get Parsed

bashprocessquotingshellssh

I've seen the questions and answers about needing to double-escape the arguments to remote ssh commands. My question is: Exactly where and when does the second parsing get done?

If I run the following:

$ ssh otherhost pstree -a -p

I see the following in the output:

  |-sshd,3736
  |   `-sshd,1102
  |       `-sshd,1109
  |           `-pstree,1112 -a -p

The parent process for the remote command (pstree) is sshd, there doesn't appear to be any shell there that would be parsing the command line arguments to the remote command, so it doesn't seem as if double quoting or escaping would be necessary (but it definitely is). If instead I ssh there first and get a login shell, and then run pstree -a -p I see the following in the output:

  ├─sshd,3736
  │   └─sshd,3733
  │       └─sshd,3735
  │           └─bash,3737
  │               └─pstree,4130 -a -p

So clearly there's a bash shell there that would do command line parsing in that case. But the case where I use a remote command directly, there doesn't seem to be a shell, so why is double quoting necessary?

Best Answer

There is always a remote shell. In the SSH protocol, the client sends the server a string to execute. The SSH command line client takes its command line arguments and concatenates them with a space between the arguments. The server takes that string, runs the user's login shell and passes it that string. (More precisely: the server runs the program that is registered as the user's shell in the user database, passing it two command line arguments: -c and the string sent by the client. The shell is not invoked as a login shell: the server does not set the zeroth argument to a string beginning with -.)

It is impossible to bypass the remote shell. The protocol doesn't have anything like sending an array of strings that could be parsed as an argv array on the server. And the SSH server will not bypass the remote shell because that could be a security restriction: using a restricted program as the user's shell is a way to provide a restricted account that is only allowed to run certain commands (e.g. an rsync-only account or a git-only account).

You may not see the shell in pstree because it may be already gone. Many shells have an optimization where if they detect that they are about to do “run this external command, wait for it to complete, and exit with the command's status”, then the shell runs “execve of this external command” instead. This is what's happening in your first example. Contrast the following three commands:

ssh otherhost pstree -a -p
ssh otherhost 'pstree -a -p'
ssh otherhost 'pstree -a -p; true'

The first two are identical: the client sends exactly the same data to the server. The third one sends a shell command which defeats the shell's exec optimization.