Linux – Why can I not bind a mount namespace to a file

bind-mountlinuxnamespaceunshare

I observe the following:

As unprivileged user in shell No 1:

user@box:~$ sysctl kernel.unprivileged_userns_clone
kernel.unprivileged_userns_clone = 1
user@box:~$ unshare --mount --user
nobody@box:~$ echo $$
18655

And as root in shell No 2:

root@box:~# mkdir -p /tmp/myns
root@box:~# touch /tmp/myns/{user,mnt}
root@box:~# mount --bind /proc/18655/ns/user /tmp/myns/user 
root@box:~# mount --bind /proc/18655/ns/mnt /tmp/myns/mnt
mount: /tmp/myns/mnt: wrong fs type, bad option, bad superblock on /proc/18655/ns/mnt, missing codepage or helper program, or other error.

The error comes as a surprise: I cannot bind-mount a mount namespace to a file, but I can bind-mount a user-namespace to a file? Why's that, and how can I make this mount-namespace available to an unprivileged user?

Why I want this: For testing a program, I want to overlay ~user with a temporary file system, initially sharing the original contents. It may be set up by root along the lines of

tmp='/tmp/GAtcNNeSfM8b'
mkdir -p "$tmp"
mount -t tmpfs -o size=100m tmpfs "$tmp"
mkdir -p "${tmp}/"{upper,work,lower}
mount --bind -o ro /home/user "${tmp}/lower"

unshare -m
mount -t overlay -o"lowerdir=${tmp}/lower,upperdir=${tmp}/upper,workdir=${tmp}/work" overlay /home/user
touch /tmp/namespace
mount --bind /proc/self/ns/mnt /tmp/namespace

but the last line fails.

The intention is that an unprivileged user may nsenter --mount=/tmp/namespace, and see the same system as before, except that changes to /home/user are not persistent. Actually, I do not even want to unshare the user namespace.

I am conciously trying to avoid the overhead of LXC, Docker or even VirtualBox. I think that should be possible with Linux standard tool.

Update: I'm running an up-to-date ArchLinux, with

$ uname -r
5.0.10-arch1-1-ARCH

Best Answer

Given that it only affects the mount namespace, I am extremely suspicious that this is due to one of the loop prevention checks for mount namespaces. I do not think it is the exact same case as the link talks about, because unshare --mount defaults to setting mount propagation to private, i.e. disabling it.

However, to protect against certain race conditions, I think full correctness might require that you mount your mount namespaces inside a mount which has private mount propagation. I also think it might be cleanest (easiest to debug) if you use unbindable. (I think unbindable already includes all the effects of private).

I.e. mount your mount namespaces inside a directory prepared using:

mount --bind /var/local/lib/myns/ /var/local/lib/myns/
mount --make-unbindable /var/local/lib/myns/

In general I think this is the safest approach, to avoid ever triggering such a problem.

My race condition is hypothetical. I would not expect you to be hitting it most of the time. So I do not know what your actual problem is.

Related Question