SSH Connections – Running in Background Don’t Exit

shellsshstdinterminal

Apparently, if the same shell launches multiple ssh connections to the same server, they won't return after executing the command they're given but will hang (Stopped (tty input)) for ever. To illustrate:

#!/bin/bash
ssh localhost sleep 2
echo "$$ DONE!"

If I run the script above more than once in the background, it never exits:

$ for i in {1..3}; do foo.sh & done
[1] 28695
[2] 28696
[3] 28697
$                      ## Hit enter

[1]   Stopped                 foo.sh

[2]-  Stopped                 foo.sh

[3]+  Stopped                 foo.sh
$                      ## Hit enter again        
$ jobs -l
[1]  28695 Stopped (tty input)     foo.sh
[2]- 28696 Stopped (tty input)     foo.sh
[3]+ 28697 Stopped (tty input)     foo.sh

Details

  • I found this because I was ssh'ing in a Perl script to run a command. The same behavior occurs when using Perl's system() call to launch ssh.
  • The same issue occurs when using Perl modules instead of system(). I tried Net::SSH::Perl, Net:SSH2 and Net::OpenSSH.
  • If I run the multiple ssh commands from different shells (open multiple terminals) they work as expected.
  • Nothing obviously useful in the ssh connection debugging info:

    OpenSSH_7.5p1, OpenSSL 1.1.0f  25 May 2017
    debug1: Reading configuration data /home/terdon/.ssh/config
    debug1: Reading configuration data /etc/ssh/ssh_config
    debug2: resolving "localhost" port 22
    debug2: ssh_connect_direct: needpriv 0
    debug1: Connecting to localhost [::1] port 22.
    debug1: Connection established.
    debug1: identity file /home/terdon/.ssh/id_rsa type 1
    debug1: key_load_public: No such file or directory
    debug1: identity file /home/terdon/.ssh/id_rsa-cert type -1
    debug1: key_load_public: No such file or directory
    debug1: identity file /home/terdon/.ssh/id_dsa type -1
    debug1: key_load_public: No such file or directory
    debug1: identity file /home/terdon/.ssh/id_dsa-cert type -1
    debug1: key_load_public: No such file or directory
    debug1: identity file /home/terdon/.ssh/id_ecdsa type -1
    debug1: key_load_public: No such file or directory
    debug1: identity file /home/terdon/.ssh/id_ecdsa-cert type -1
    debug1: key_load_public: No such file or directory
    debug1: identity file /home/terdon/.ssh/id_ed25519 type -1
    debug1: key_load_public: No such file or directory
    debug1: identity file /home/terdon/.ssh/id_ed25519-cert type -1
    debug1: Enabling compatibility mode for protocol 2.0
    debug1: Local version string SSH-2.0-OpenSSH_7.5
    debug1: Remote protocol version 2.0, remote software version OpenSSH_7.5
    debug1: match: OpenSSH_7.5 pat OpenSSH* compat 0x04000000
    debug2: fd 3 setting O_NONBLOCK
    debug1: Authenticating to localhost:22 as 'terdon'
    debug3: hostkeys_foreach: reading file "/home/terdon/.ssh/known_hosts"
    debug3: record_hostkey: found key type ECDSA in file /home/terdon/.ssh/known_hosts:47
    debug3: load_hostkeys: loaded 1 keys from localhost
    debug3: order_hostkeyalgs: prefer hostkeyalgs: ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521
    debug3: send packet: type 20
    debug1: SSH2_MSG_KEXINIT sent
    debug3: receive packet: type 20
    debug1: SSH2_MSG_KEXINIT received
    debug2: local client KEXINIT proposal
    debug2: KEX algorithms: curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha256,diffie-hellman-group14-sha1,ext-info-c
    debug2: host key algorithms: ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,ssh-ed25519-cert-v01@openssh.com,ssh-rsa-cert-v01@openssh.com,ssh-ed25519,rsa-sha2-512,rsa-sha2-256,ssh-rsa
    debug2: ciphers ctos: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com,aes128-cbc,aes192-cbc,aes256-cbc
    debug2: ciphers stoc: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com,aes128-cbc,aes192-cbc,aes256-cbc
    debug2: MACs ctos: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
    debug2: MACs stoc: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
    debug2: compression ctos: none,zlib@openssh.com,zlib
    debug2: compression stoc: none,zlib@openssh.com,zlib
    debug2: languages ctos: 
    debug2: languages stoc: 
    debug2: first_kex_follows 0 
    debug2: reserved 0 
    debug2: peer server KEXINIT proposal
    debug2: KEX algorithms: curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256,diffie-hellman-group14-sha1
    debug2: host key algorithms: ssh-rsa,rsa-sha2-512,rsa-sha2-256,ecdsa-sha2-nistp256,ssh-ed25519
    debug2: ciphers ctos: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com
    debug2: ciphers stoc: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com
    debug2: MACs ctos: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
    debug2: MACs stoc: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
    debug2: compression ctos: none,zlib@openssh.com
    debug2: compression stoc: none,zlib@openssh.com
    debug2: languages ctos: 
    debug2: languages stoc: 
    debug2: first_kex_follows 0 
    debug2: reserved 0 
    debug1: kex: algorithm: curve25519-sha256
    debug1: kex: host key algorithm: ecdsa-sha2-nistp256
    debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
    debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
    debug3: send packet: type 30
    debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
    debug3: receive packet: type 31
    debug1: Server host key: ecdsa-sha2-nistp256 SHA256:uxhkh+gGPiCJQPaP024WXHth382h3BTs7QdGMokB9VM
    debug3: hostkeys_foreach: reading file "/home/terdon/.ssh/known_hosts"
    debug3: record_hostkey: found key type ECDSA in file /home/terdon/.ssh/known_hosts:47
    debug3: load_hostkeys: loaded 1 keys from localhost
    debug1: Host 'localhost' is known and matches the ECDSA host key.
    debug1: Found key in /home/terdon/.ssh/known_hosts:47
    debug3: send packet: type 21
    debug2: set_newkeys: mode 1
    debug1: rekey after 134217728 blocks
    debug1: SSH2_MSG_NEWKEYS sent
    debug1: expecting SSH2_MSG_NEWKEYS
    debug3: receive packet: type 21
    debug1: SSH2_MSG_NEWKEYS received
    debug2: set_newkeys: mode 0
    debug1: rekey after 134217728 blocks
    debug2: key: /home/terdon/.ssh/id_rsa (0x555a5e4b5060)
    debug2: key: /home/terdon/.ssh/id_dsa ((nil))
    debug2: key: /home/terdon/.ssh/id_ecdsa ((nil))
    debug2: key: /home/terdon/.ssh/id_ed25519 ((nil))
    debug3: send packet: type 5
    debug3: receive packet: type 7
    debug1: SSH2_MSG_EXT_INFO received
    debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,ssh-rsa,rsa-sha2-256,rsa-sha2-512,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521>
    debug3: receive packet: type 6
    debug2: service_accept: ssh-userauth
    debug1: SSH2_MSG_SERVICE_ACCEPT received
    debug3: send packet: type 50
    debug3: receive packet: type 51
    debug1: Authentications that can continue: publickey,password
    debug3: start over, passed a different list publickey,password
    debug3: preferred publickey,keyboard-interactive,password
    debug3: authmethod_lookup publickey
    debug3: remaining preferred: keyboard-interactive,password
    debug3: authmethod_is_enabled publickey
    debug1: Next authentication method: publickey
    debug1: Offering RSA public key: /home/terdon/.ssh/id_rsa
    debug3: send_pubkey_test
    debug3: send packet: type 50
    debug2: we sent a publickey packet, wait for reply
    debug3: receive packet: type 60
    debug1: Server accepts key: pkalg rsa-sha2-512 blen 279
    debug2: input_userauth_pk_ok: fp SHA256:OGvtyUIFJw426w/FK/RvIhsykeP8kIEAtAeZwYBIzok
    debug3: sign_and_send_pubkey: RSA SHA256:OGvtyUIFJw426w/FK/RvIhsykeP8kIEAtAeZwYBIzok
    debug3: send packet: type 50
    debug3: receive packet: type 52
    debug1: Authentication succeeded (publickey).
    Authenticated to localhost ([::1]:22).
    debug2: fd 6 setting O_NONBLOCK
    debug1: channel 0: new [client-session]
    debug3: ssh_session2_open: channel_new: 0
    debug2: channel 0: send open
    debug3: send packet: type 90
    debug1: Requesting no-more-sessions@openssh.com
    debug3: send packet: type 80
    debug1: Entering interactive session.
    debug1: pledge: network
    debug3: receive packet: type 80
    debug1: client_input_global_request: rtype hostkeys-00@openssh.com want_reply 0
    debug3: receive packet: type 91
    debug2: callback start
    debug2: fd 3 setting TCP_NODELAY
    debug3: ssh_packet_set_tos: set IPV6_TCLASS 0x08
    debug2: client_session2_setup: id 0
    debug1: Sending command: sleep 2
    debug2: channel 0: request exec confirm 1
    debug3: send packet: type 98
    debug2: callback done
    debug2: channel 0: open confirm rwindow 0 rmax 32768
    debug2: channel 0: rcvd adjust 2097152
    debug3: receive packet: type 99
    debug2: channel_input_status_confirm: type 99 id 0
    debug2: exec request accepted on channel 0
    
  • This doesn't depend on my ~/.ssh/config setup. Renaming the file doesn't change anything.

  • This happens on multiple machines. I've tried 4 or 5 different machines running updated Ubuntu and Arch distros.
  • The command (sleep in the dummy example but something a good deal more complex in real life) exits successfully and does what it's supposed to do. This doesn't depend on the command you're running, it's an ssh issue.
  • This is the worst of them: it isn't consistent. Every now and then, one of the instances will exit and return control to the parent script. But not always, and there is no pattern I've been able to discern.
  • Renaming ~/.bashrc makes no difference. Also, I've run this on machines running Ubuntu (default login shell dash) and Arch (default login shell bash, called as sh).
  • Interestingly, the issue only occurs if I hit any key (for example Enter, but any seems to work) after launching the loop but before the first script exits. If I leave the terminal alone, they finish as expected.

What's going on? Is this a bug in ssh? Is there an option I need to set? How can I launch multiple instances of a script that runs a command over ssh from the same shell?

Best Answer

Foreground processes and terminal access control

To understand what is going on, you need to know a little about sharing terminals. What happens when two programs try to read from the same terminal at the same time? Each input byte goes randomly to one of the programs. (Not random as in the kernel uses an RNG to decide, just random as in unpredictable in practice.) The same thing happens when two programs read from a pipe, or any other file type which is a stream of bytes being moved from one place to another (socket, character device, …), rather than a byte array where any byte can be read multiple times (regular file, block device). For example, run a shell in a terminal, figure out the name of the terminal and run cat.

$ tty
/dev/pts/18
$ cat

Then from another terminal, run cat /dev/pts/18. Now type in the terminal, and watch as lines sometimes go to one of the cat processes and sometimes to the other. Lines are dispatched as a whole when the terminal is in cooked mode. If you put the terminal in raw mode then each byte would be dispatched independently.

That's messy. Surely there should be a mechanism to decide that one program gets the terminal, and the others don't. Well, there is! It triggers in typical cases, but not in the scenario I set up above. That scenario is unusual because cat /dev/pts/18 wasn't started from /dev/pts/18. It's unusual to access a terminal from a program that wasn't started inside this terminal. In the usual case, you run a shell in a terminal, and you run programs from that shell. Then the rule is that the program in the foreground gets the terminal, and programs in the background don't. This is known as terminal access control. The way it works is:

  • Each process has a controlling terminal (or doesn't have one, typically because it doesn't have any open file descriptor that's a terminal).
  • When a process tries to access its controlling terminal, if the process is not in the foreground, then the kernel blocks it. (Conditions apply. Access to other terminals is not regulated.)
  • The shell decides who is the foreground process. (Foreground process group, actually.) It calls the tcsetpgrp to let the kernel know who should be in the foreground.

This works in typical cases. Run a program in a shell, and that program gets to be the foreground process. Run a program in the background (with &), and the program doesn't get to be in the foreground. When the shell is displaying a prompt, the shell puts itself in the foreground. When you resume a suspended job with fg, the job gets to be in the foreground. With bg, it doesn't.

If a background process tries to read from the terminal, the kernel sends it a SIGTTIN signal. The default action of the signal is to suspend the process (like SIGSTOP). The parent of the process can know about this by calling waitpid with the WSTOPPED flag; when a child process receives a signal that suspends it, the waitpid call in the parent returns and lets the parent know what the signal was. This is how the shell knows to print “Stopped (tty input)”. What it's telling you is that this job is suspended due to a SIGTTIN.

Since the process is suspended, nothing will happen to it until it's resumed or killed (with a signal that the process doesn't catch, because if the process has set a signal handler, it won't run since the process is suspended). You can resume the process by sending it a SIGCONT, but that won't achieve anything if the process is reading from the terminal, it'll receive another SIGTTIN immediately. If you resume the process with fg, it goes to the foreground and so the read succeeds.

Now you understand what happens when you run cat in the background:

$ cat &
$ 
[1] + Stopped (tty input)        cat
$ 

The case of SSH

Now let's do the same thing with SSH.

$ ssh localhost sleep 999999 &
$ 
$ 
$ 
[1] + Stopped (tty input)        ssh localhost sleep 999999
$ 

Pressing Enter sometimes goes to the shell (which is in the foreground), and sometimes to the SSH process (at which point it gets stopped by SIGTTIN). Why? If ssh was reading from the terminal, it should receive SIGTTIN immediately, and if it wasn't then why does it receive SIGTTIN?

What's happening is that the SSH process calls the select system call to know when input is available on any of the files it's interested in (or if an output file is ready to receive more data). The input sources include at least the terminal and the network socket. Unlike read, select is not forbidden to background processes, and ssh doesn't receive a SIGTTIN when it calls select. The intent of select is to find out whether data is available, without disrupting anything. Ideally select would not change the system state at all, but in fact this isn't completely true. When select tells the SSH process that input is available on the terminal file descriptor, the kernel has to commit to sending input if the process calls read afterwards. (If it didn't, and the process called read, then there might be no input available at this point, so the return value from select would have been a lie.) So if the kernel decides to route some input to the SSH process, it decides by the time the select system call returns. Then SSH calls read, and at that point the kernel sees that a background process tried to read from the terminal and suspends it with SIGTTIN.

Note that you don't need to launch multiple connections to the same server. One is enough. Multiple connections merely increases the probability that the problem arises.

The solution: don't read from the terminal

If you need the SSH session to read from the terminal, run it in the foreground.

If you don't need the SSH session to read from the terminal, make sure that its input is not coming from the terminal. There are two ways to do this:

  • You can redirect the input:

    ssh … </dev/null
    
  • You can instruct SSH not to forward a terminal connection with -n or -f. (-n is equivalent to </dev/null; -f allows SSH itself to read from the terminal, e.g. to read a password, but the command itself won't have the terminal open.)

    ssh -n …
    

Note that the disconnection between the terminal and SSH has to happen on the client. The sleep process running on the server will never read from the terminal, but SSH has no way to know that. If the client receives input on standard input, it must forward it to the server, which will make the data available in a buffer in case the application ever decides to read it (and if the application calls select, it'll be informed that data is available).