Reliable Way to Jail Child Processes Using Nsenter

namespaceprocess

I know that Linux namespaces, among many other things, can be leveraged to handle restricting and jailing child processes securely without any chance of their being zombied and dumped on init. But I'm fuzzy on implementation details. How might I use the tools provided by util-linux such as mount and nsenter to watch, monitor, and ensure that all processes launched are the direct namespace descendants of another process?

Best Answer

Create a PID namespace

The correct command to use here is unshare. Note that the necessary options to do this are only available from util-linux 2.23. The idea is to create a new PID namespace for the program you are running such that all its children are also created in this namespace. You can run a command in a new PID namespace simply by doing:

sudo unshare -fp some_command

To run a shell, just omit the command. This will create a process which, along with any of its children, will have a PID as usual within the parent (system) namespace. However, within the new namespace, it will have a PID of 1 along with some of the special characteristics of the init process. Perhaps the most relevant characteristic from a monitoring perspective is that if a any of its descendants are orphaned, they will be re-parented to this process rather than the real init process.

Simply doing this may be enough for most monitoring cases. As previously mentioned, the processes within the namespace all have PIDs within the parent namespace so regular commands can be used to monitor their activity. We are also assured that if any process in the namespace becomes orphaned, it will not fall out of the process tree branches beneath the PID of the the top level program meaning that it can still easily be kept track of.

Combine with a mount namespace

However, what we can't do is monitor the process with respect to the PID that it thinks that is has. To do this, and in particular to be able to use the ps command within the new namespace, you need to mount a separate procfs filesystem for the namespace. This in turn leads to another problem since the only location that ps accepts for procfs is /proc. One solution would be to create a chroot jail and mount the new procfs there. But this is a cumbersome approach as at a minimum we would need to copy (or at least hard link) any binaries that we intend to use along with any libraries they depend on to the new root.

The solution is to also use a new mount namespace. Within this we can mount the new procfs in a way that uses the true root /proc directory, can be usable within PID namespace and doesn't interfere with anything else. To make this process very simple, the unshare command gives the --mount-proc option:

sudo unshare -fp --mount-proc some_command

Now running ps within the combined namespaces will show only the processes with the PID namspace and it will show the top level process as having a PID of 1.

What about nsenter?

As the name suggests, nsenter can be used to enter a namespace that has already been created with unshare. This is useful if we want to get information only available from inside the namespace from an otherwise unrelated script. The simplest way is to access give the PID of any program running within the namespace. To be clear this must be the PID of the target program within the namespace from which nsenter is being run (since namespaces can be nested, it is possible for a single process to have many PIDs). To run a shell in the target PID/mount namespace, simply do:

sudo nsenter -t $PID -m -p

If this namespace is set up as above, ps will now list only processes within that namespace.