Ubuntu – SSH Server stops working after reboot, caused by missing /var/run/sshd

16.04openvzserversshstartup

My VPS wasn't rebooted for about 3 months. It is hosted on a server with OpenVZ virtualization type and the operating system is Ubuntu 16.04.
For some reason, I rebooted the VPS and after that, I couldn't connect to the server through ssh, the message that I received is:

ssh: connect to host srvname.com port 22: Connection refused

So I opened a Serial Console on the VPS and start investigating… I've purged and reinstalled the openssh-server with no success. I spent two hours reading articles, question, and answers about similar issues on Internet.

Finally I managed to understand that the directory /var/run/sshd is not created during the system startup. And once I create it manually I can start the SSH service without any problem, but on the next reboot the issue remains. So my questions are:

  • What could be the cause of this issue? Why /var/run/sshd is not created during the system startup?

  • How can I solve the issue in a proper way? I found a temporal solution that is mentioned at the end of this post.

  • Does the issue could be related to the OpenVZ host of the VPS? Should I ask the hosting provider to solve it?


The output of systemctl status ssh.service, sshd -Ddp 22 and journalctl -xe is:

# systemctl status ssh.service
● ssh.service - OpenBSD Secure Shell server
   Loaded: loaded (/lib/systemd/system/ssh.service; enabled; vendor preset: enabled)
   Active: failed (Result: start-limit-hit) since вт 2019-01-15 12:58:08 EET; 22s ago
  Process: 407 ExecStartPre=/usr/sbin/sshd -t (code=exited, status=255)

яну 15 12:58:07 srvname systemd[1]: Failed to start OpenBSD Secure Shell server.
яну 15 12:58:07 srvname systemd[1]: ssh.service: Unit entered failed state.
яну 15 12:58:07 srvname systemd[1]: ssh.service: Failed with result 'exit-code'.
яну 15 12:58:08 srvname systemd[1]: ssh.service: Service hold-off time over, scheduling restart.
яну 15 12:58:08 srvname systemd[1]: Stopped OpenBSD Secure Shell server.
яну 15 12:58:08 srvname systemd[1]: ssh.service: Start request repeated too quickly.
яну 15 12:58:08 srvname systemd[1]: Failed to start OpenBSD Secure Shell server.
яну 15 12:58:08 srvname systemd[1]: ssh.service: Unit entered failed state.
яну 15 12:58:08 srvname systemd[1]: ssh.service: Failed with result 'start-limit-hit'.


# $(which sshd) -Ddp 22
debug1: sshd version OpenSSH_7.2, OpenSSL 1.0.2g  1 Mar 2016
debug1: private host key #0: ssh-rsa SHA256:...
debug1: private host key #1: ssh-dss SHA256:...
debug1: private host key #2: ecdsa-sha2-nistp256 SHA256:...
debug1: private host key #3: ssh-ed25519 SHA256:...
Missing privilege separation directory: /var/run/sshd


# journalctl -xe
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit ssh.service has begun starting up.
яну 15 13:21:21 srvname sshd[1688]: Missing privilege separation directory: /var/run/sshd
яну 15 13:21:21 srvname systemd[1]: ssh.service: Control process exited, code=exited status=255
яну 15 13:21:21 srvname systemd[1]: Failed to start OpenBSD Secure Shell server.
-- Subject: Unit ssh.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit ssh.service has failed.
-- 
-- The result is failed.
яну 15 13:21:21 srvname systemd[1]: ssh.service: Unit entered failed state.
яну 15 13:21:21 srvname systemd[1]: ssh.service: Failed with result 'exit-code'.
яну 15 13:21:22 srvname systemd[1]: ssh.service: Service hold-off time over, scheduling restart.
яну 15 13:21:22 srvname systemd[1]: Stopped OpenBSD Secure Shell server.
-- Subject: Unit ssh.service has finished shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit ssh.service has finished shutting down.
яну 15 13:21:22 srvname systemd[1]: Starting OpenBSD Secure Shell server...
-- Subject: Unit ssh.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit ssh.service has begun starting up.
яну 15 13:21:22 srvname sshd[1691]: Missing privilege separation directory: /var/run/sshd
яну 15 13:21:22 srvname systemd[1]: ssh.service: Control process exited, code=exited status=255
яну 15 13:21:22 srvname systemd[1]: Failed to start OpenBSD Secure Shell server.
-- Subject: Unit ssh.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit ssh.service has failed.
-- 
-- The result is failed.
яну 15 13:21:22 srvname systemd[1]: ssh.service: Unit entered failed state.
яну 15 13:21:22 srvname systemd[1]: ssh.service: Failed with result 'exit-code'.
яну 15 13:21:22 srvname systemd[1]: ssh.service: Service hold-off time over, scheduling restart.
яну 15 13:21:22 srvname systemd[1]: Stopped OpenBSD Secure Shell server.
-- Subject: Unit ssh.service has finished shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit ssh.service has finished shutting down.
яну 15 13:21:22 srvname systemd[1]: ssh.service: Start request repeated too quickly.
яну 15 13:21:22 srvname systemd[1]: Failed to start OpenBSD Secure Shell server.
-- Subject: Unit ssh.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit ssh.service has failed.
-- 
-- The result is failed.
яну 15 13:21:22 srvname systemd[1]: ssh.service: Unit entered failed state.
яну 15 13:21:22 srvname systemd[1]: ssh.service: Failed with result 'start-limit-hit'.

The content of /usr/lib/tmpfiles.d/sshd.conf and /etc/init/ssh.conf is:

# cat /usr/lib/tmpfiles.d/sshd.conf 
d /var/run/sshd 0755 root root

# cat /etc/init/ssh.conf | sed '/^#/ d'

description "OpenSSH server"

start on runlevel [2345]
stop on runlevel [!2345]

respawn
respawn limit 10 5
umask 022

env SSH_SIGSTOP=1
expect stop

console none

pre-start script
    test -x /usr/sbin/sshd || { stop; exit 0; }
    test -e /etc/ssh/sshd_not_to_be_run && { stop; exit 0; }

    mkdir -p -m0755 /var/run/sshd
end script

exec /usr/sbin/sshd -D

Additional information about the system:

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.5 LTS
Release:    16.04
Codename:   xenial

# uname -a
Linux srvname 2.6.32-042stab127.2 #1 SMP Thu Jan 4 16:41:44 MSK 2018 x86_64 x86_64 x86_64 GNU/Linux

# apt show openssh-server | grep 'Version'
Version: 1:7.2p2-4ubuntu2.6

The temporal solution:
I found that /var/run is a symbolic link to /run, I do not know why this is needed, but when I modified the content of the file /usr/lib/tmpfiles.d/sshd.conf from:

d /var/run/sshd 0755 root root

to:

d /run/sshd 0755 root root

everything goes well on system startup, the SSH service is started normally and I'm able to log-in via SSH.

Best Answer

I found this is a bug with the current version of systemd and old kernels that are used by some VPS privdes as it is in my case. This bug appears time to time, as we can see on Launchpad: Bug #45234, Bug #1811580; or on ServerFault: Why am I missing /var/run/sshd after every boot?

There are few workarounds of this issue, they all come together to alternative way to create /var/run/sshd before running the SSH server. Here are three possible solutions.


Workaround 1: Modify /usr/lib/tmpfiles.d/sshd.conf in the following way:

d /run/sshd 0755 root root

As it is mentioned in the question, /var/run is a symbolic link to /run, the final result is identical: /var/run/sshd is created. I do not know why, but this works.


Workaround 2: Use Cron job that will create /var/run/sshd and restart the SSH server, you can use the root's crontab for this purpose - execute sudo crontab -e and add the following entry:

@reboot mkdir -p -m0755 /var/run/sshd && systemctl restart ssh.service

Currently I'm using this solution, so it is also tested.


Workaround 3: Use /etc/rc.local to do the same as the above, as it is shown in this comment on bug report #45234.

Related Question