Debian – Unable to kill process with PID 1 in docker container

debiandockerkillprocess

I have the following Dockerfile for creating a container with a powerdns recursor in it:

FROM debian:stretch-slim
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && \
    apt-get install --no-install-recommends -y \
    pdns-recursor && \
    rm -rf /var/lib/apt/lists/* && \
    apt-get clean
COPY ./configuration/recursor.conf /etc/powerdns/recursor.conf
RUN chown -R :pdns /etc/powerdns/ && \
    chmod 0750 /etc/powerdns/ && \
    chmod 0640 /etc/powerdns/recursor.conf
EXPOSE 8699
ENTRYPOINT ["/usr/sbin/pdns_recursor", "--daemon=no"]

My recursor.conf looks like this:

config-dir=/etc/powerdns
forward-zones=resolver1.opendns.com=208.67.222.222
hint-file=/usr/share/dns/root.hints
local-address=0.0.0.0
local-port=8699
quiet=yes
security-poll-suffix=
setgid=pdns
setuid=pdns

IPv6 is disabled on the hypervisor.

The problem is that docker is not able to stop the container properly with docker stop recursor. After some time the OOMKiller terminates the programm with the following information:

Exited (137) 2 seconds ago

I searched the web and the signals 128 + 9 = 137 mean that I don't have sufficient RAM, what is simply not the case. When I execute docker exec -it recursor /bin/bash and try to kill PID 1 (kill -9 -- 1) within the container I don't get any reaction – the service simply continues to run as if nothing happened.

I also tried to start the recursor in daemon-mode – same result.

Does anyone has an idea why that is so?

Best Answer

Process with PID 1 is the init process. That stays true in a pid namespace or a container: this pid 1 cannot be killed with SIGKILL because it has no KILL signal handler defined, contrary to any other userland process.

If you really want to kill it, you have to kill it from the host. Running on the host (with enough privileges, probably root):

kill -KILL $(docker inspect --format '{{.State.Pid}}' containername)

This will bring down the whole container since removing its PID 1 means stopping the container. Please note that I answered to the title of the question, but not to the underlying problem: what is causing OOM.

UPDATE: probably simplier to use docker kill, which defaults to the KILL signal. That would be:

docker kill containername

UPDATE2: convince that PID 1 cannot be killed with SIGKILL (aka -9), even in a container (the example requires user namespace enabled else just use unshare --mount-proc --fork --pid as root).

first terminal:

$ unshare --map-root-user --mount-proc --fork --pid
# echo $$
1
# pstree -p
bash(1)---pstree(88)
# kill -9 1
#

no effect

On a second terminal:

$ pstree -p $(pidof unshare)
unshare(2023)───bash(2024)
$ kill -9 2024

first terminal:

# Killed
$

Related Solutions

Shell Process – Can’t Kill Gedit Process from Its PID

The gedit process is already terminated.

Remember how Windows applications mainly worked back in the Win16 days before Win32 came along and did away with it: where there were hInstance and hPrevInstance, attempting to run a second instance of many applications simply handed things over to the first instance, and this made things difficult for command scripting tools (like Take Command) because one would invoke an application a second time, it would visibly be there on the screen as an added window, but as far as the command interpreter was concerned the child process that it had just run immediately exited?

Well GNOME has brought the Win16 behaviour back for Linux.

With GIO applications like gedit, the application behaves as follows:

If there's no registered "server" named org.gnome.gedit already on the per-user/per-login Desktop Bus, gedit decides that it's the first instance. It becomes the org.gnome.gedit server and continues to run.
If there is a registered "server" named org.gnome.gedit already on the per-user/per-login Desktop Bus, gedit decides that it's a second or subsequent instance. It constructs Desktop Bus messages to the first instance, passing along its command line options and arguments, and then simply exits.

So what you see depends from whether you have the gedit server already running. If you haven't, you'll be in sebvieira's shoes and wondering why you aren't seeing the behaviour described. If you have, you'll be in your shoes and seeing the gedit process terminating almost immediately, especially since you haven't given it any command-line options or arguments to send over to the "first instance". Hence the reason that there's no longer a process with that ID.

Much fun ensues when, as alluded to above, the per-login Desktop Bus is switched to the "new" style of a per-user Desktop Bus, and suddenly there's not a 1:1 relationship between a Desktop Bus and an X display any more. Single user-bus-wide instance applications suddenly have to be capable of talking to multiple X displays concurrently.

Further hilarity ensues when people attempt to run gedit as the superuser via sudo, as it either cannot connect to a per-user Desktop Bus or connects to the wrong (the superuser's) Desktop Bus.

There's a proposal to give gedit a command-line option that makes the process that is invoked just be the actual editor application, so that gedit would be useful as the editor pointed-to by the EDITOR environment variable (which it isn't for many common usages of EDITOR, from crontab to git, when it just exits immediately). This proposal has not become reality yet.

In the meantime, people have various ways of having a simple second instance of a "lightweight text editor", such as invoking a whole new Desktop Bus instance, private to the invocation of gedit, with dbus-run-session. Of course, this tends to spin up other GNOME Desktop Bus servers on this private bus as they are in turn invoked by gedit, making it not "lightweight" at all.

The icing on the cake is when you've followed this recommendation or one like it and interposed a shell function named gedit that immediately removes the gedit process from the shell's list of jobs. Not only does the process terminate rapidly so that you don't see it later with kill or ps, but the shell doesn't even monitor it as a shell-controlled job.

Best Answer

Related Solutions

Shell Process – Can’t Kill Gedit Process from Its PID

Further reading

Kill process when PID is constantly changing

Related Question