I believe, you are executing the command umount /old_root
still from the old root, and therefore it is busy.
I once did a similar script, and the following worked for me:
#!/bin/sh
mount -v -n -t proc -onodev,noexec,nosuid proc /proc
mount -v -n -t sysfs -onodev,noexec,nosuid sysfs /sys
mount -v -t ext4 /dev/sdb1 /mnt/root
mount --move /dev /mnt/root/dev/
mount --move /proc /mnt/root/proc/
mount --move /sys /mnt/root/sys/
echo "Switching root filesystem..."
cd /mnt/root
pivot_root . mnt/tmp/
exec chroot . /sbin/init
then, inside the new root, the first command the new init executes is umount /mnt/tmp/
.
It sounds like the alternative implementation of pivot_root() would put the calling process in a new, altered mount namespace. Is that a valid reading?
No. IMO this is not very clear, but there is a much more consistent and correct reading.
The essential part of pivot_root(), which must be the same in either implementation, is:
pivot_root() moves the root filesystem of the calling process to the directory put_old and makes new_root the new root filesystem of the calling process.
The essential part of pivot_root() is not limited only to the calling process. The operation described in this quote works on the mount namespace of the calling process. It will affect the view of all the processes in the same mount namespace.
Consider the effect the essential change has on such a second process - or kernel thread - whose working directory was the old root filesystem. Its current directory will still be the old root filesystem. This will keep the /put_old
mount point busy, and so it will not be possible to unmount the old root filesystem.
If you control this second process, you resolve this, as per the manpage, by setting its working directory to new_root before pivot_root() is called. After pivot_root() is called, its current directory will still be the new root filesystem.
So process S(ystemd) has been configured to signal process P(lymouth), to change working directory before S calls pivot_root(). No problem. But, we also have kernel threads, which start in /
. The current implementation of pivot_root() takes care of the kernel threads for us; it is equivalent to setting the working directories of kernel threads and any other process to new_root
before the essential part of pivot_root().
Except, the current implementation of pivot_root() only changes the working directory of a process if the old working directory was /
. So it's actually quite easy to see the difference this makes:
$ unshare -rm
# cd /tmp # work in a subdir instead of '/', and pivot_root() will not change it
# /bin/pwd
/tmp
# mount --bind /new-root /new-root
# pivot_root /new-root /new-root/mnt
# /bin/pwd
/mnt/tmp # see below: if pivot_root had not updated our current chroot, this would still show /tmp
v.s.
$ unshare -rm
# cd /
# /bin/pwd
/
# ls -lid .
2 dr-xr-xr-x. 19 nfsnobody nfsnobody 4096 Jun 13 01:17 .
# ls -lid /newroot
6424395 dr-xr-xr-x. 20 nfsnobody nfsnobody 4096 May 10 12:53 /new-root
# mount --bind /new-root /new-root
# pivot_root /new-root /new-root/mnt
# /bin/pwd
/
# ls -lid .
6424395 dr-xr-xr-x. 20 nobody nobody 4096 May 10 12:53 .
# ls -lid /
6424395 dr-xr-xr-x. 20 nobody nobody 4096 May 10 12:53 /
# ls -lid /mnt
2 dr-xr-xr-x. 19 nobody nobody 4096 Jun 13 01:17 /mnt
Now I understand what's happening with the working directory, I find it easier to understand what's happening with chroot(). The current chroot of the process which calls pivot_root() may be a reference to the original root filesystem, just as its current working directory may be.
Note, if you do chdir()+pivot_root() but forgot to chroot(), your current directory would be outside your current chroot. When your current directory is outside your current chroot, things get quite confusing. You probably don't want to run your program in this state.
# cd /
# python
>>> import os
>>> os.chroot("/newroot")
>>> os.system("/bin/pwd")
(unreachable)/
0
>>> os.getcwd()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 2] No such file or directory
>>> os.system("ls -l ./proc/self/cwd")
lrwxrwxrwx. 1 root root 0 Jun 17 13:46 ./proc/self/cwd -> /
0
>>> os.system("ls -lid ./proc/self/cwd/")
2 dr-xr-xr-x. 19 root root 4096 Jun 13 01:17 ./proc/self/cwd/
0
>>> os.system("ls -lid /")
6424395 dr-xr-xr-x. 20 root root 4096 May 10 12:53 /
0
POSIX does not specify the result of pwd
or getcwd() in this situation :). POSIX gives no warning that you might get an "No such file or directory" (ENOENT) error from getcwd(). Linux manpages point out this error as being possible, if the working directory was unlinked (e.g. with rm
). I think this is a very good parallel.
Best Answer
Both the current working directory, and the root, are reset to the root filesystem of the entered mount namespace.
For example, I have tested that I can escape
chroot
by runningnsenter -m --target $$
.(Reminder:
chroot
is easy to escape when you are still root.man chroot
documents the well-known way of doing this).Source
https://elixir.bootlin.com/linux/latest/source/fs/namespace.c?v=4.17#L3507
Note:
current
means the current task - the current thread/process.->fs
will be the filesystem data of that task - this is shared between tasks that are threads within the same process. E.g. you will see below that changing the working directory is an operation on->fs
.E.g. changing the working directory affects all threads of the same process. POSIX-compatible threads like this are implemented using the CLONE_FS flag of clone().
here is the line in question: