Is the Unix process scheduler itself a process, or does it piggyback on other processes in the same way a system call does (running kernel code in the user process with the kernel bit set)?
Is the Unix process scheduler itself a process
processscheduling
Related Solutions
Let's start with the IO scheduler first. There's a IO scheduler per block device. Its job is to schedule (order) the requests that pile up in the device queue. There are three different algorithms currently shipped in the linux kernel: deadline
, noop
and cfq
. cfq
is the default, and according to its doc:
The CFQ I/O scheduler tries to distribute bandwidth equally among all processes in the system. It should provide a fair and low latency working environment, suitable for both desktop and server systems
You can configure which scheduler governs which device via the scheduler
file corresponding to your block device under /sys/
(You can issue the following command to find it: find /sys | grep queue/scheduler
).
What that short description doesn't say is that cfq
is the only scheduler that looks at the ioprio
of a process. ioprio
is a setting that you can assign to the process, and the algorithm will take that into account when choosing a request before another. ioprio
can be set via the ionice
utility.
Then, there's the task scheduler. Its job is to allocate the CPUs amongst the processes that are ready to run. It takes into account things like the priority, the class and the niceness of a give process, as well as how long that process has run and other heuristics.
Now, to your questions:
What is relationship between IO scheduler and CPU scheduler?
Not much, besides the name. They schedule different shared resources. The first one orders the requests going to the disks, and the second one schedules the 'requests' (you can view a process as requesting CPU time to be able to run) to the CPU.
CPU scheduling happens first. IO scheduler is a thread itself and subject to CPU scheduling.
It doesn't happen like the the IO scheduler algorithm is run by whichever process is queuing a request. A good way to see this is to look at crashes that have elv_add_request()
in their path. For example:
[...]
[<c027fac4>] error_code+0x74/0x7c
[<c019ed65>] elv_next_request+0x6b/0x116
[<e08335db>] scsi_request_fn+0x5e/0x26d [scsi_mod]
[<c019ee6a>] elv_insert+0x5a/0x134
[<c019efc1>] __elv_add_request+0x7d/0x82
[<c019f0ab>] elv_add_request+0x16/0x1d
[<e0e8d2ed>] pkt_generic_packet+0x107/0x133 [pktcdvd]
[<e0e8d772>] pkt_get_disc_info+0x42/0x7b [pktcdvd]
[<e0e8eae3>] pkt_open+0xbf/0xc56 [pktcdvd]
[<c0168078>] do_open+0x7e/0x246
[<c01683df>] blkdev_open+0x28/0x51
[<c014a057>] __dentry_open+0xb5/0x160
[<c014a183>] nameidata_to_filp+0x27/0x37
[<c014a1c6>] do_filp_open+0x33/0x3b
[<c014a211>] do_sys_open+0x43/0xc7
[<c014a2cd>] sys_open+0x1c/0x1e
[<c0102b82>] sysenter_past_esp+0x5f/0x85
Notice how the process enters the kernel calling open(), and this ends up involving the elevator (elv
) algorithm.
You can get a lot of internal information about processes, the scheduler, and other components of the OS and the hardware by using
cat /proc/...
where ... can be many things. For instance, it could be a process ID, followed by a lot of specific information request, or scheduler debug information request, for example:
cat /proc/sched_debug
To see the whole list of options, type ls /proc
. You will see a long list of process ID numbers, but also a few interesting names, such as sched_debug, cpuinfo, meminfo, uptime, and more.
All this is available thanks to the virtual file system procfs You can read more about it here.
Another useful command is:
top
This will show a real time information about how the processes are scheduled, memory usage, and more.
Best Answer
A Unix process scheduler doesn't really "piggy back" on a system call. Executing the scheduler is part of just about any system call.
A
read()
system call or anexit()
system call absolutely has to cause the scheduler to execute. In the case of aread()
a disk access might take a very long time. Unless you want everything to be very slow, you need to run the scheduler to see what process should run while the first process waits for the disk to come back with data. Aread()
might happen on a socket - the time it takes for data to come back from some remote server is indeterminate. The kernel must reschedule some other process. In the case of anexit()
, the process making the system call doesn't want to exist any more, so some other process must be scheduled. For apause()
or analarm()
, the process wants to execute some time in the future. Again, the scheduler has to pick another process to run.I believe that most, but not all, system calls cause the Unix/Linux/*BSD scheduler to execute. Sometimes
gettimeofday()
doesn't cause the scheduler to run - Solaris used to work that way. But in general, you can safely think of a system call as doing the work (sending data over the NIC, setting up for a disk read or write, doing process exit work, forking, whatever), running the scheduler, and then executing whatever process is supposed to run next. Sometimes that's the same process the made the system call, but a lot of the time, it's not.