Linux – How to change Linux context-switch frequency

linuxlinux-kernelprocess

How is it possible to change the Linux (linaro, ubuntu, debian) context-switch frequency?

I am okay for trading-off a less-responsive system for a more efficient one.

EDIT1: I have a main process which I want to run as fast as possible (maximal clock cycles per second), so I thought of reducing the context-switch frequency (=increasing the timeslice). The question is how to do it, and would there be a significant effect. Can I calculate the cost of the context switch? Meaning, can I estimate if I increase the timeslice by two, what will my performance gain be in % for the main process I care about?

Best Answer

If your task is the only process requesting time on a specific CPU, there will be no context switches between tasks :-). But the CPU may still be interrupted, causing a context switch into the kernel and back. And one possible cause is the pre-emption timer, checking if there is another task to run on this CPU...

Linux can avoid generating any pre-emption timer interrupts on the cpu when there will be no reason to do so. See CONFIG_NO_HZ_FULL. To use this feature, it must be enabled when the kernel was built, and it must be enabled using a boot option.

By default, no CPU will be an adaptive-ticks CPU. The "nohz_full=" boot parameter specifies the adaptive-ticks CPUs. For example, "nohz_full=1,6-8" says that CPUs 1, 6, 7, and 8 are to be adaptive-ticks CPUs. Note that you are prohibited from marking all of the CPUs as adaptive-tick CPUs [...]

LWN.net says "according to Ingo Molnar, as much as 1% of the CPU's time will be saved" for adaptive-ticks CPUs. The kernel document says this has six different costs, and there is also a list of "KNOWN ISSUES".

This gain is relatively small, particularly compared to the potential throughput gains of reducing the frequency of context-switches between multiple tasks, as referenced in this answer: How to change the length of time-slices used by the Linux CPU scheduler?

Small print: these measurements pre-date Spectre, Meltdown, KPTI and x86 ASID support :-(. And I guess they also apply to somewhat older hardware. Ask a kernel expert or run your own measurements on how the cost of context-switches has changed on your specific kernel version and hardware... PTI was largely supposed to be mitigated by ASID, except for software that calls into the kernel very frequently, the main example being databases. But I don't have a good grasp on the numbers.

Molnar's hope in the original RFC patch was that with time, it "will likely be enabled by most Linux distros". I notice Fedora 28 provides a default kernel built with NO_HZ_FULL support. Debian 9 does not, however.


More recently, Linux v4.17 removes a residual 1 Hz timer tick from the nohz_full CPUs. I imagine the effect on throughput is quite small :-), but I've been trying to follow the status of NO_HZ_FULL benefits when there are multiple runnable processes on a CPU -

once we reach 0 Hz we can [then] remove the periodic tick assumption from nr_running>=2 as well, by essentially interrupting busy tasks only as frequently as the sched_latency constraints require us to do - once every 4-40 msecs, depending on nr_running.

This is a bit confusing as pre-emption already started using a separate, more precise tick back in v2.6.25-rc1, commit 8f4d37ec073c, "sched: high-res preemption tick". Found via this comment on the same LWN.net article: https://lwn.net/Articles/549754/ ).

Related Question