1)
From the docs sched-design-CFS.txt:
CFS stands for "Completely Fair Scheduler," and is the new "desktop" process
scheduler implemented by Ingo Molnar and merged in Linux 2.6.23. It is the
replacement for the previous vanilla scheduler's SCHED_OTHER interactivity
code.
It seems you mixing up the O(1) scheduler with the CFQ io-scheduler.
So there are SCHED_{NORMAL, BATCH, IDLE} policies. IDLE does not have any priorities. And sched classes idle, best-effort and realtime.
2) Sadly you do not show what commands you typed. For example change init's io-scheduling to best-effort class
# ionice -p 1
none: prio 0
# ionice -c2 20 -p 1
# ionice -p 1
best-effort: prio 4
Let's start with the IO scheduler first. There's a IO scheduler per block device. Its job is to schedule (order) the requests that pile up in the device queue. There are three different algorithms currently shipped in the linux kernel: deadline
, noop
and cfq
. cfq
is the default, and according to its doc:
The CFQ I/O scheduler tries to distribute bandwidth equally
among all processes in the system. It should provide a fair
and low latency working environment, suitable for both desktop
and server systems
You can configure which scheduler governs which device via the scheduler
file corresponding to your block device under /sys/
(You can issue the following command to find it: find /sys | grep queue/scheduler
).
What that short description doesn't say is that cfq
is the only scheduler that looks at the ioprio
of a process. ioprio
is a setting that you can assign to the process, and the algorithm will take that into account when choosing a request before another. ioprio
can be set via the ionice
utility.
Then, there's the task scheduler. Its job is to allocate the CPUs amongst the processes that are ready to run. It takes into account things like the priority, the class and the niceness of a give process, as well as how long that process has run and other heuristics.
Now, to your questions:
What is relationship between IO scheduler and CPU scheduler?
Not much, besides the name. They schedule different shared resources. The first one orders the requests going to the disks, and the second one schedules the 'requests' (you can view a process as requesting CPU time to be able to run) to the CPU.
CPU scheduling happens first. IO scheduler is a thread itself and subject to CPU scheduling.
It doesn't happen like the the IO scheduler algorithm is run by whichever process is queuing a request. A good way to see this is to look at crashes that have elv_add_request()
in their path. For example:
[...]
[<c027fac4>] error_code+0x74/0x7c
[<c019ed65>] elv_next_request+0x6b/0x116
[<e08335db>] scsi_request_fn+0x5e/0x26d [scsi_mod]
[<c019ee6a>] elv_insert+0x5a/0x134
[<c019efc1>] __elv_add_request+0x7d/0x82
[<c019f0ab>] elv_add_request+0x16/0x1d
[<e0e8d2ed>] pkt_generic_packet+0x107/0x133 [pktcdvd]
[<e0e8d772>] pkt_get_disc_info+0x42/0x7b [pktcdvd]
[<e0e8eae3>] pkt_open+0xbf/0xc56 [pktcdvd]
[<c0168078>] do_open+0x7e/0x246
[<c01683df>] blkdev_open+0x28/0x51
[<c014a057>] __dentry_open+0xb5/0x160
[<c014a183>] nameidata_to_filp+0x27/0x37
[<c014a1c6>] do_filp_open+0x33/0x3b
[<c014a211>] do_sys_open+0x43/0xc7
[<c014a2cd>] sys_open+0x1c/0x1e
[<c0102b82>] sysenter_past_esp+0x5f/0x85
Notice how the process enters the kernel calling open(), and this ends up involving the elevator (elv
) algorithm.
Best Answer
Try something like this:
All processes start with just one thread, and can create more, using
pthread_create
for example. (All of a process's threads created this way share the same address space.) The kernel's scheduler works on these threads, regardless of whether they're a process's "main"/initial thread or additional ones - there's essentially no difference between them from scheduler's point of view.Linux initially didn't have threads at all, only processes. So the part of the OS that schedules "CPU work" is generally called the process scheduler, for historical reasons. (This isn't Linux-specific, same thing for most (all?) Unix-type systems. Thread scheduler is simply not the usual vocabulary used.)
I wouldn't even mention
clone
(let alonevfork
) at that point, unless you've already explained the whole namespaces business.