Linux – Mounting and Unmounting Inherited Mounts in New Namespace

linux-kernelmountnamespace

Experiment 1

From outside the namespace, cat /proc/self/mountinfo gives

291 34 0:37 / /tmp/IMJUSTTMP rw,relatime shared:152 - tmpfs tmpfs rw,size=102400k
34 23 0:32 / /tmp rw,nosuid,nodev shared:16 - tmpfs tmpfs rw

Then I run unshare -mU --map-root-user --propagation private /usr/bin/zsh to get a new shell inside a namespace, but inside the newly-created mount namespace, I can't umount /tmp/IMJUSTTMP, umount just tell me it's not mounted. While I can check the newly-created mount namespace by cat /proc/self/mountinfo, which gives private mount

290 263 0:32 / /tmp rw,nosuid,nodev - tmpfs tmpfs rw
302 290 0:37 / /tmp/IMJUSTTMP rw,relatime - tmpfs tmpfs rw,size=102400k

Then why do I get umount: /tmp/IMJUSTTMP: not mounted. when I try to umount /tmp/IMJUSTTMP inside the namespace?

I'm using 5.0.9-arch1-1-ARCH, with kernel.unprivileged_userns_clone = 1.

Experiment 2

After unshare -mU --map-root-user --propagation private /usr/bin/zsh, trying to create an overlayfs also fail.

mkdir -p /tmp/IMJUSTTMP/work
mkdir /tmp/IMJUSTTEST
mount -t tmpfs -o size=100m tmpfs /tmp/IMJUSTTMP
mount -t tmpfs -o size=200M tmpfs /tmp/IMJUSTTEST

Will all succeed as expected, While all the following would get permission denied inside the namespace.

mount -t overlay -o "lowerdir=/home/xtricman,upperdir=/tmp/IMJUSTTMP/,workdir=/tmp/IMJUSTTMP/work" overlay /home/xtricman
mount -t overlay -o "lowerdir=/tmp/IMJUSTTEST,upperdir=/tmp/IMJUSTTMP,workdir=/tmp/IMJUSTTMP/work" overlay /mnt

Rough Guess of mine

I found these two questions, Inside a user namespace, why am I not allowed to remount a filesystem I have mounted? and Why can't I bind-mount "/" inside a user namespace? It seems that since I inherit the /tmp/IMJUSTTMP and /tmp mount, so I can't umount them even if I got full capabilities in the owning user namespace of the newly-created mount namespace.

Question

Can anyone explain what exactly what's going on of the two experiments? Is there any document mentioning detailed kernel behavior of mounting and umounting inside a mount namespace? What is the "superblock owner" as mentioned in this kernel commit and Why can't I bind-mount "/" inside a user namespace? ?

Best Answer

Yes :-). There are three distinct points here.

Experiment 1: Why do I get umount: /tmp/IMJUSTTMP: not mounted when I try to umount /tmp/IMJUSTTMP inside the namespace?

http://man7.org/linux/man-pages/man7/mount_namespaces.7.html

Restrictions on mount namespaces

Note the following points with respect to mount namespaces:

  • A mount namespace has an owner user namespace. A mount names‐ pace whose owner user namespace is different from the owner user namespace of its parent mount namespace is considered a less privileged mount namespace.

  • When creating a less privileged mount namespace, shared mounts are reduced to slave mounts. (Shared and slave mounts are discussed below.) This ensures that mappings performed in less privileged mount namespaces will not propagate to more privileged mount namespaces.

  • Mounts that come as a single unit from more privileged mount are locked together and may not be separated in a less privi‐ leged mount namespace. (The unshare(2) CLONE_NEWNS operation brings across all of the mounts from the original mount names‐ pace as a single unit, and recursive mounts that propagate between mount namespaces propagate as a single unit.)

  • The mount(2) flags MS_RDONLY, MS_NOSUID, MS_NOEXEC, and the "atime" flags (MS_NOATIME, MS_NODIRATIME, MS_RELATIME) set‐ tings become locked when propagated from a more privileged to a less privileged mount namespace, and may not be changed in the less privileged mount namespace.

Experiment 2: Trying to create an overlayfs also fails

Attempts to make the mount operation safe for ordinary users are nothing new; LWN covered one patch set back in 2008. That work was never merged, but the effort to allow unprivileged mounts picked up in 2015, when Eric Biederman (along with others, Seth Forshee in particular) got serious about allowing user namespaces to perform filesystem mounts. The initial work was merged in 2016 for the 4.8 kernel, but it was known to not be a complete solution to the problem, so most filesystems can still only be mounted by users who are privileged in the initial namespace.

Unprivileged filesystem mounts, 2018 edition, LWN.net

The 2008 LWN article says filesystems that have been verified as "safe for use within user namespaces" are flagged as FS_USERNS_MOUNT. So we can easily search to find which filesystems are allowed.

What is the "superblock owner" as mentioned in this kernel commit and the question "Why can't I bind-mount "/" inside a user namespace?" ?

The source code in the kernel commit you link to, says that each superblock is considered owned by a specific user namespace. The owner is the user namespace which originally created the superblock.

Related Question