I have been looking into the iowait
property shown in top utility output as shown below.
top - 07:30:58 up 3:37, 1 user, load average: 0.00, 0.01, 0.05
Tasks: 86 total, 1 running, 85 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
iowait
is generally defined as follows:
"It is the time during which CPU is idle and there is some IO pending."
It is my understanding that a process is run on a single CPU. After it gets de-scheduled either because it has used up its time slot or after it gets blocked, it can eventually be scheduled again on any one CPU again.
In case of IO request, a CPU that puts a process in uninterruptible sleep is responsible for tracking the iowait
time. The other CPUs would be reporting the same time as idle time on their end as they really are idle. Is this assumption correct?
Furthermore, assuming there is a long IO request (meaning the process had several opportunities to get scheduled but didn't get scheduled because the IO wasn't complete), how does a CPU know there is "pending IO"? Where is that kind of information fetched from? How can a CPU simply find out that some process was put to sleep some time for an IO to complete as any of the CPUs could have put that process to sleep. How is this status of "pending IO" confirmed?
Best Answer
The CPU doesn’t know any of this, the task scheduler does.
The definition you quote is somewhat misleading; the current
procfs(5)
manpage has a more accurate definition, with caveats:iowait
tries to measure time spent waiting for I/O, in general. It’s not tracked by a specific CPU, nor can it be (point 2 above — which also matches what you’re wondering about). It is measured per CPU though, as far as possible.The task scheduler “knows” there is pending I/O, because it knows that it suspended a given task because it’s waiting for I/O. This is tracked per task in the
in_iowait
field of thetask_struct
; you can look forin_iowait
in the scheduler core to see how it is set, tracked and cleared. Brendan Gregg’s recent article on Linux load averages includes useful background information. Theiowait
entry in/proc/stat
, which is what ends up intop
, is incremented whenever a timer tick is accounted for, and the current process “on” the CPU is idle; you can see this by looking foraccount_idle_time
in the scheduler’s CPU time-tracking code.So a more accurate definition would be “time spent on this CPU waiting for I/O, when there was nothing better to do”...