Linux – How does the system shutdown of a linux kernel work internally

linux-kernelshutdown

I have somehow an rough idea of how the userspace and init-system (be it classic init sysV /upstart/ systemd) work at system shutdown.
(Essentially there is an order succession of "Stop!", "Please stop now really", "Process I need to kill you to Stop" and waiting… things going on).

I am anyhow very unaware of how the system shutdown works in the kernel (where surely there is also lots of stuff to do)?

I tried to look into the kernel documentation https://www.kernel.org/doc/htmldocs/ and even used the NSA's pal search tool to give me a head start on finding out how it works.

Also I searched on SE U+L and found nothing (did I overlook it?)

Anyways the question, though potentially a bit challenging, would merit an answer in this Q&A network as I assume more people are interested to get a sketch for what happens in the linux kernel at shutdown.

Potentially there is also change to link to some more detailed explanations.

An answer might maybe include which system-calls and which kernal signals are used?

https://github.com/torvalds/linux/blob/b3a3a9c441e2c8f6b6760de9331023a7906a4ac6/arch/x86/kernel/reboot.c
seems to be the x86 used file related to reboot (already close to shutdown, eh?)

maybe the snippet found here http://lxr.free-electrons.com/source/kernel/reboot.c#L176
can be used to give an explanation

176 void kernel_power_off(void)
177 {
178         kernel_shutdown_prepare(SYSTEM_POWER_OFF);
179         if (pm_power_off_prepare)
180                 pm_power_off_prepare();
181         migrate_to_reboot_cpu();
182         syscore_shutdown();
183         pr_emerg("Power down\n");
184         kmsg_dump(KMSG_DUMP_POWEROFF);
185         machine_power_off();
186 }
187 EXPORT_SYMBOL_GPL(kernel_power_off);

Best Answer

The main resources to understand how the Linux kernel works are:

  1. The documentation.
  2. Linux Weekly News articles.
  3. The source. This is a complex beast which is a little easier to apprehend through LXR, the Linux cross-reference. The LXR variant running on lxr.linux.no is nicer than others, but it's often down.

In this case, I can't find anything centrally relevant in the documentation or on LWN, so LXR it is.

The last thing the userland code does is call the reboot system call. It takes 4 arguments, so search for SYSCALL_DEFINE4(reboot on LXR, which leads to kernel/reboot.c. After checking the caller's privileges and the arguments, the syscall entry point calls one of several functions: kernel_restart to reboot, kernel_halt to halt on a tight loop, kernel_poweroff to power off the system, kernel_kexec to replace the kernel by a new one (if compiled in), or hibernate to save the memory to disk before powering off.

kernel_restart, kernel_halt and kernel_power_off are fairly similar:

  1. Go through reboot_notifier_list, which is a list of hooks that kernel components can register to execute code on powerdown. Only a few drivers need to execute code at this stage, mostly watchdogs.
  2. Set the system_state variable.
  3. Disable usermode-helper, to ensure that no user code will be started anymore. (There can still be existing processes at this stage.)
  4. Call device_shutdown to release or power down all devices on the system. A lot of drivers hook into this stage.
    Note that any filesystems that are still mounted at this point are effectively forcibly unmounted. The caller of the system call takes responsibility for any clean unmounting.
  5. For power off only, if ACPI is configured in, possibly execute code to prepare going into ACPI state S5 (soft power off).
  6. In a multi-CPU machine, the code could be running on any CPU, whichever invoked the system call. migrate_to_reboot_cpu takes care to switch to one particular CPU and prevent the scheduler from dispatching code on other CPUs. After this point, only a single CPU is running.
  7. syscore_shutdown calls the shutdown method of registered syscore operations. I think this is mostly about disabling interrupts; few hooks have a shutdown method.
  8. Log an information message — the swan's song.
  9. Finally go to rest in some machine-dependent way by calling machine_restart, machine_halt or machine_power_off.

The hibernation code goes through the following steps:

  1. Iterate through the power management hooks.
  2. Sync filesystems.
  3. Freeze all user code.
  4. Prevent device hotplugging.
  5. Dump the system state to the swap space.
  6. If everything succeeded, hibernate the hardware. This can involve calling kernel_restart, kernel_halt or kernel_power_off, or some platform-specific hibernation method.

A different way to shut the system down is machine_emergency_restart. This is invoked by the magic SysRq key B. The O key works differently: it calls kernel_power_off.

The system can also shut down to a panic, i.e. an unrecoverable error. Panicking attempts to log a message, then reboot the system (either via a hardware watchdog or an emergency restart).

Related Question