Unix/Linux process scheduler logs

logsscheduling

Where can I actually see what decisions did a process scheduler make over a time period? Are there scheduling specific logs maintained by Unix/Linux systems I could take a look or should I be looking for specific lines in general logs?

Best Answer

My guess is that there does not exist such a log for the simple reason that there are so many processes scheduled every second that you'll be blown away by the amount of logging lines.

Related Solutions

Record time of every process or thread context switch

I don't have an answer but you might find one amongst the tools, examples and resources written or listed by Brendan Gregg on the perf command and Linux kernel ftrace and debugfs.

On my Raspberry Pi these tools were in package perf-tools-unstable. The perf command was actually in /usr/bin/perf_3.16.

Of interest may be this discussion and context-switch benchmark by Benoit Sigoure, and the lat_ctx test from the fairly old lmbench suite.

They may need some work to run on the Pi, for example with tsuna/contextswitch I edited timectxswws.c get_iterations() to while (iterations * ws_pages * 4096UL < 4294967295UL) {, and removed -march=native -mno-avx from the Makefile.

Using perf record for 10 seconds on the Pi over ssh whilst simultaneously doing while sleep .1;do echo hi;done in another ssh:

sudo timeout -10 perf_3.16 record -e context-switches -a
sudo perf_3.16 script -f time,pid,comm | less

gives output like this

           sleep 29341 2703976.560357: 
         swapper     0 2703976.562160: 
    kworker/u8:2 29163 2703976.564901: 
         swapper     0 2703976.565737: 
            echo 29342 2703976.565768: 
     migration/3    19 2703976.567549: 
           sleep 29343 2703976.570212: 
     kworker/0:0 28906 2703976.588613: 
     rcu_preempt     7 2703976.609261: 
           sleep 29343 2703976.670674: 
            bash 29066 2703976.671654: 
            echo 29344 2703976.675065: 
            sshd 29065 2703976.675454: 
         swapper     0 2703976.677757:

presumably showing when a context-switch event happened, for which process.

Linux – Using and understanding systemd scheduling-related options in a desktop context

CPUScheduling{Policy|Priority}

The link tells you that CPUSchedulingPriority should only be set for fifo or rr ("real-time") tasks. You do not want to force real-time scheduling on services.

CPUSchedulingPolicy=other is the default.

That leaves batch and idle. The difference between them is only relevant if you have multiple idle-priority tasks consuming CPU at the same time. In theory batch gives higher throughput (in exchange for longer latencies). But it's not a big win, so it's not really relevant in this case.

idle literally starves if anything else wants the CPU. CPU priority is rather less significant than it used to be, for old UNIX systems with a single core. I would be happier starting with nice, e.g. nice level 10 or 14, before resorting to idle. See next section.

However most desktops are relatively idle most of the time. And when you do have a CPU hog that would pre-empt the background task, it's common for the hog only to use one of your CPUs. With that in mind, I would not feel too risky using idle in the context of an average desktop or laptop. Unless it has an Atom / Celeron / ARM CPU rated at or below about 15 watts; then I would want to look at things a bit more carefully.

Is nice level 'subverted' by the kernel 'autogroup' feature?

Yeah.

Autogrouping is a little weird. The author of systemd didn't like the heuristic, even for desktops. If you want to test disabling autogrouping, you can set the sysctl kernel.sched_autogroup_enabled to 0. I guess it's best to test by setting the sysctl in permanent configuration and rebooting, to make sure you get rid of all the autogroups.

Then you should be able to nice levels for your services without any problem. At least in current versions of systemd - see next section.

E.g. nice level 10 will reduce the weight each thread has in the Linux CPU scheduler, to about 10%. Nice level 14 is under 5%. (Link: full formula)

Appendix: is nice level 'subverted' by systemd cgroups?

The current DefaultCPUAccounting= setting defaults to off, unless it can be enabled without also enabling CPU control on a per-service basis. So it should be fine. You can check this in your current documentation: man systemd-system.conf

Be aware that per-service CPU control will also be enabled when any service sets CPUAccounting / CPUWeight / StartupCPUWeight / CPUShares / StartupCPUShares.

The following blog extract is out of date (but still online). The default behaviour has since changed, and the reference documentation has been updated accordingly.

As a nice default, if the cpu controller is enabled in the kernel, systemd will create a cgroup for each service when starting it. Without any further configuration this already has one nice effect: on a systemd system every system service will get an even amount of CPU, regardless how many processes it consists off. Or in other words: on your web server MySQL will get the roughly same amount of CPU as Apache, even if the latter consists a 1000 CGI script processes, but the former only of a few worker tasks. (This behavior can be turned off, see DefaultControllers= in /etc/systemd/system.conf.)

On top of this default, it is possible to explicitly configure the CPU shares a service gets with the CPUShares= setting. The default value is 1024, if you increase this number you'll assign more CPU to a service than an unaltered one at 1024, if you decrease it, less.

http://0pointer.de/blog/projects/resources.html