Given podman is installed on a linux system and a systemd unit named baz.service:
# /etc/systemd/system/baz.service
[Service]
ExecStart=/usr/bin/podman run --rm --tty --name baz alpine sh -c 'while true; do date; sleep 1; done'
ExecStop=/usr/bin/podman stop baz
And the baz.service started:
# systemctl daemon-reload
# systemctl start baz.service
Then when I check the status of the unit I don't see the sh
or sleep
process in the /system.slice/baz.service cgroup
# systemctl status baz
● baz.service
Loaded: loaded (/etc/systemd/system/baz.service; static; vendor preset: enabl
Active: active (running) since Sat 2019-08-10 05:50:18 UTC; 14s ago
Main PID: 16910 (podman)
Tasks: 9
Memory: 7.3M
CPU: 68ms
CGroup: /system.slice/baz.service
└─16910 /usr/bin/podman run --rm --tty --name baz alpine sh -c while
# ...
I was expecting to see the sh
and sleep
children in my baz.service status because I've heard people from redhat say podman uses a traditional fork-exec model.
If podman did fork and exec, then wouldn't my sh
and sleep
process be children of podman and be in the same cgroup as the original podman process?
I was expecting to be able to use systemd and podman to be able to manage my containers without the children going off to a different parent and escape from my baz.service ssystemd unit.
Looking at the output of ps
I can see that sh
and sleep
are actually children of a different process called conmon
. I'm not sure where conmon came from, or how it was started but systemd didn't capture it.
# ps -Heo user,pid,ppid,comm
# ...
root 17254 1 podman
root 17331 1 conmon
root 17345 17331 sh
root 17380 17345 sleep
From the output it's clear that my baz.service unit is not managing the conmon -> sh -> sleep chain.
- How is podman different from the docker client server model?
- How is podman's conmon different from docker's containerd?
Maybe they are both container runtimes and the the dockerd
daemon is what people people want to get rid of.
So maybe docker is like:
- dockerd daemon
- docker cli
- containerd container runtime
And podman is like:
- podman cli
- conmon container runtime
So maybe podman uses a traditional fork exec model but it's not the podman cli that's forking and exec, it's the conmon process.
I feel confused.
Best Answer
The whole idea behind
podman
is to go away from the centralized architecture with the super-powerful overseer (e.g.dockerd
), where the centralized daemon is a single point of failure. There even is a hashtag about this - "#nobigfatdaemons".How to avoid the centralized container management? You remove the single main daemon (again,
dockerd
) and start the containers independently (at the end of the day, containers are just processes, so you don't need the daemon to spawn them).However, you still need the way to
stdout
andstderr
of the container;wait(2)
on container's PID 1;For this purpose, each podman container is still supervised by a small daemon, called
conmon
(from "container monitor"). The difference with the Docker daemon is that this daemon is as small as possible (check the size of the source code), and it is spawned per-container. Ifconmon
for one container crashes, the rest of the system stays unaffected.Next, how the container gets spawned?
Considering that the user may want to run the container in the background, like with Docker, the
podman run
process forks twice and only then executesconmon
:The middle process between
podman run
andconmon
(i.e. the direct parent ofconmon
- in the example above it is PID 8484) will exit andconmon
will be reparented byinit
, thus becoming self-managed daemon. After this,conmon
also forks off the runtime (e.g.runc
) and, finally, the runtime executes the container's entrypoint (e.g./bin/sh
).When the container is running,
podman run
is no longer required and may exit, but in your case it stays online, because you did not ask it to detach from the container.Next,
podman
makes use of cgroups to limit the containers. This means that it creates new cgroups for new containers and moves the processes there. By the rules of cgroups, the process may be the member of only one cgroup at a time, and adding the process to some cgroup removes it from other cgroup (where it was previously) within the same hierarchy. So, when the container is started, the final layout of cgroups looks like the following:podman run
remains in cgroups of thebaz.service
, created bysystemd
, theconmon
process is placed in its own cgroups, and containerized processes are placed in their own cgroups:Note: PID 13075 above is actually a
sleep 1
process, spawned after the death of PID 13043.Hope this helps.