Linux – What are the effects, if any, of scheduler priorities and policies for threads in an uncontended cpuset

cgroupslinuxpriorityscheduling

I have a Linux system where we have used cgroups to create two cpu_exclusive cpusets, A and B, and where we have migrated all user threads and all unbound kernel threads to a cgroup attached to cpuset A. Things running in cpuset A have varying scheduler policies and varying priorities, and there are many more threads running in cpuset A than there are cores in cpuset A.

There is also some small number of very active processes attached to cpuset B, where the total number of user threads across these processes is never greater than the number of cores exclusively available in cpuset B. The goal is to shield these important tasks running in cpuset B from other activity on the machine and to minimize processing latency.

In such a setup, does the scheduling policy/priority of the user threads running in cpuset B have any observable effect? Stated differently: would changing the scheduling policy of the B cpuset threads from the default SCHED_OTHER to SCHED_FIFO or SCHED_RR have any consequences, good or bad?

It seems like the answer should be 'no', since the scheduler should be able to assign each thread running in cpuset B its own dedicated core, so there would be nothing to prioritize or schedule, and so the policy and relative priority of the B cpuset threads wouldn't matter. On the other hand, there are the bound kernel threads and the 'scheduler domain' aspects to worry about, and probably other things I have not considered.

Do the scheduling policies and priorities of threads running in an overprovisioned exclusive cpuset matter in any practical sense?

Best Answer

The time slice used will matter for CPU intensive jobs that require cache persistence, unless you lock a particular core to each PID. You can increase the time slice with scheduler policy SCHED_BATCH and improve performance up to 300% in some cases, while reducing interactive responsiveness. The opposite effect of smaller time slices occurs with SCHED_RR (which will reduce throughput but increase real time responsiveness).

You can use schedtool to set the policy of specific PIDs for all PIDs in set B as a single command. It can also be used to lock specific PIDs to specific cores, which would be the optimal solution since then cache persistence no longer depends on the time slice, but this takes more effort since you have to run a separate schedtool command for each PID.

Related Question