Unmount sys/fs/cgroup/systemd after chroot, without rebooting

chrootfilesystemsmountsysfssystemd

Background: I am exploring how to copy an ordinary LVM-on-LUKS Debian 9 ("Stretch") installation from a thumb drive (the "source drive") onto a ZFS-formatted drive (the "target drive") in order to achieve a ZFS-on-LUKS installation. My process is based on this HOWTO.* I think the ZFS aspect is irrelevant to the issue I would like help with, but I am mentioning it just in case it matters.

As part of my process, while Stretch is running from the source drive, I mount the target ZFS root (/) filesystem at /mnt. I then recursively bind:

  • /dev to /mnt/dev
  • /proc to /mnt/proc
  • /sys to /mnt/sys.

I then chroot into /mnt.

(In the future, when I am chrooted into /mnt, I intend to run update-initramfs, update-grub, etc, to configure the contents of the /boot partition.)

I then exit the chroot, and my trouble begins. I find that I can unmount /mnt/dev and /mnt/proc, but not /mnt/sys. The latter refuses to unmount because it contains /mnt/sys/fs/cgroup/systemd, which the system for some reason thinks is "in use". Reformatting the ZFS drive and rebooting fixes the problem, but hugely slows the iterations of my learning and documentation process.

My questions are:

– How can I unmount /mnt/sys after the chroot, without rebooting?

– Is the failure (umount: /mnt/sys/fs/cgroup/systemd: target is busy) expected? If not, against which piece of software should I file a bug report: umount, cgroups, systemd, the Linux kernel, or something else?

Here is (I think) a minimal working example. (If you are having difficulty reproducing this and think I might have missed out a step, let me know.) First, the boilerplate:

# Activate the ZFS kernel module
/sbin/modprobe zfs

# Set variables
BOOT_POOL=bpool
ROOT_POOL=rpool
DIRS_TO_COPY=(boot bin etc home lib lib64 opt root sbin srv usr var)
FILES_TO_COPY=(initrd.img initrd.img.old vmlinuz vmlinuz.old)
VIRTUAL_FILESYSTEM_DIRS=(dev proc sys)

## Partition target drive
# 1MB BIOS boot partition
sgdisk -a2048 -n1:2048:4095     -t1:EF02 $1 -c 1:"bios_boot_partition"
wait
# 510MB partition for /boot ZFS filesystem
sgdisk -a2048 -n2:4096:1052671  -t2:BF07 $1 -c 2:"zfs_boot_partition"
wait
# Remaining drive space, except the last 510MiB in case of future need:
# partition to hold the LUKS container and the root ZFS filesystem
sgdisk -a2048 -n3:1052672:-510M -t3:8300 $1 -c 3:"luks_zfs_root_partition"
wait

# Before proceeding, ensure /dev/disk/by-id/ knows of these new partitions
partprobe
wait

# Create the /boot pool
zpool create -o ashift=12            \
             -O atime=off            \
             -O canmount=off         \
         -O compression=lz4      \
         -O normalization=formD  \
             -O mountpoint=/boot     \
             -R /mnt                 \
             $BOOT_POOL "$1"-part2
wait

# Create the LUKS container for the root pool
cryptsetup luksFormat "$1"-part3               \
                      --hash sha512            \
                      --cipher aes-xts-plain64 \
              --key-size 512
wait

# Open LUKS container that will contain the root pool
cryptsetup luksOpen "$1"-part3 "$DRIVE_SHORTNAME"3_crypt
wait

# Create the root pool
zpool create -o ashift=12           \
             -O atime=off           \
             -O canmount=off        \
             -O compression=lz4     \
             -O normalization=formD \
             -O mountpoint=/        \
             -R /mnt                \
             $ROOT_POOL /dev/mapper/"$DRIVE_SHORTNAME"3_crypt
wait

# Create ZFS datasets for the root ("/") and /boot filesystems
zfs create -o canmount=noauto -o mountpoint=/      "$ROOT_POOL"/debian
zfs create -o canmount=noauto -o mountpoint=/boot  "$BOOT_POOL"/debian

# Mount the root ("/") and /boot ZFS datasets
zfs mount "$ROOT_POOL"/debian
zfs mount "$BOOT_POOL"/debian

# Create datasets for subdirectories
zfs create                     -o setuid=off              "$ROOT_POOL"/home
zfs create -o mountpoint=/root                            "$ROOT_POOL"/home/root
zfs create -o canmount=off     -o setuid=off  -o exec=off "$ROOT_POOL"/var
zfs create -o com.sun:auto-snapshot=false                 "$ROOT_POOL"/var/cache
zfs create                                                "$ROOT_POOL"/var/log
zfs create                                                "$ROOT_POOL"/var/mail
zfs create                                                "$ROOT_POOL"/var/spool
zfs create -o com.sun:auto-snapshot=false     -o exec=on  "$ROOT_POOL"/var/tmp
zfs create                                                "$ROOT_POOL"/srv
zfs create -o com.sun:auto-snapshot=false     -o exec=on  "$ROOT_POOL"/tmp

# Set the `bootfs` property. ***TODO: IS THIS CORRECT???***
zpool set bootfs="$ROOT_POOL"/debian "$ROOT_POOL"

# Set correct permission for tmp directories
chmod 1777 /mnt/tmp
chmod 1777 /mnt/var/tmp

And here's the core part of the issue:

# Copy Debian install from source drive to target drive
for i in "${DIRS_TO_COPY[@]}"; do 
    rsync --archive --quiet --delete /"$i"/ /mnt/"$i"
done
for i in "${FILES_TO_COPY[@]}"; do
    cp -a /"$i" /mnt/
done
for i in "${VIRTUAL_FILESYSTEM_DIRS[@]}"; do
    # Make mountpoints for virtual filesystems on target drive
    mkdir /mnt/"$i"
    # Recursively bind the virtual filesystems from source environment to the
    # target. N.B. This is using `--rbind`, not `--bind`.
    mount --rbind /"$i"  /mnt/"$i"
done

# `chroot` into the target environment
chroot /mnt /bin/bash --login

# (Manually exit from the chroot)

# Delete copied files
for i in "${DIRS_TO_COPY[@]}" "${FILES_TO_COPY[@]}"; do
    rm -r /mnt/"$i"
done

# Remove recursively bound virtual filesystems from target
for i in "${VIRTUAL_FILESYSTEM_DIRS[@]}"; do
    # First unmount them
    umount --recursive --verbose --force /mnt/"$i" || sleep 0
    wait
    # Then delete their mountpoints
    rmdir /mnt/"$i"
    wait
done

At this last step, I get:

umount: /mnt/sys/fs/cgroup/systemd: target is busy
    (In some cases useful info about processes that
     use the device is found by lsof(8) or fuser(1).)

In case it helps: findmnt shows the full sys tree mounted twice: once at /sys and identically at /mnt/sys.

* Debian Jessie Root on ZFS, CC BY-SA 3.0, by Richard Laager and George Melikov.

Best Answer

You need to add mount --make-rslave /mnt/"$i" after your first mount command, to set the correct propagation flags for those mount points.

They protect the host from changes made inside the chroot environment, and help prevent blocking situations like yours.

Related Question