I know that Linux namespaces, among many other things, can be leveraged to handle restricting and jailing child processes securely without any chance of their being zombied and dumped on init
. But I'm fuzzy on implementation details. How might I use the tools provided by util-linux
such as mount
and nsenter
to watch, monitor, and ensure that all processes launched are the direct namespace descendants of another process?
Reliable Way to Jail Child Processes Using Nsenter
namespaceprocess
Best Answer
Create a PID namespace
The correct command to use here is
unshare
. Note that the necessary options to do this are only available fromutil-linux 2.23
. The idea is to create a new PID namespace for the program you are running such that all its children are also created in this namespace. You can run a command in a new PID namespace simply by doing:To run a shell, just omit the command. This will create a process which, along with any of its children, will have a PID as usual within the parent (system) namespace. However, within the new namespace, it will have a PID of
1
along with some of the special characteristics of theinit
process. Perhaps the most relevant characteristic from a monitoring perspective is that if a any of its descendants are orphaned, they will be re-parented to this process rather than the realinit
process.Simply doing this may be enough for most monitoring cases. As previously mentioned, the processes within the namespace all have PIDs within the parent namespace so regular commands can be used to monitor their activity. We are also assured that if any process in the namespace becomes orphaned, it will not fall out of the process tree branches beneath the PID of the the top level program meaning that it can still easily be kept track of.
Combine with a mount namespace
However, what we can't do is monitor the process with respect to the PID that it thinks that is has. To do this, and in particular to be able to use the
ps
command within the new namespace, you need to mount a separateprocfs
filesystem for the namespace. This in turn leads to another problem since the only location thatps
accepts forprocfs
is/proc
. One solution would be to create achroot
jail and mount the newprocfs
there. But this is a cumbersome approach as at a minimum we would need to copy (or at least hard link) any binaries that we intend to use along with any libraries they depend on to the new root.The solution is to also use a new mount namespace. Within this we can mount the new
procfs
in a way that uses the true root/proc
directory, can be usable within PID namespace and doesn't interfere with anything else. To make this process very simple, theunshare
command gives the--mount-proc
option:Now running
ps
within the combined namespaces will show only the processes with the PID namspace and it will show the top level process as having a PID of1
.What about
nsenter
?As the name suggests,
nsenter
can be used to enter a namespace that has already been created withunshare
. This is useful if we want to get information only available from inside the namespace from an otherwise unrelated script. The simplest way is to access give the PID of any program running within the namespace. To be clear this must be the PID of the target program within the namespace from whichnsenter
is being run (since namespaces can be nested, it is possible for a single process to have many PIDs). To run a shell in the target PID/mount namespace, simply do:If this namespace is set up as above,
ps
will now list only processes within that namespace.