iostat Command – Meaning of ‘Steal’ Field

iostatlinuxvirtual machine

In output of iostat there is a steal field, according to man page the field is used to:

Show the percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor.

But what does that mean? Does it means the kernel itself is too busy to manage a cpu, and cause the cpu to be idle?

Best Answer

The hypervisor means the layer that manages a virtual environment, like VMware, XEN or VirtualBox.

So the steal field, should be an interesting field to monitor, to detect problems or oversubscription of a virtualised environment. The field itself means the time the VM CPU has to wait for others VMs (virtual machines) finishing their turn (slice), or for a task of the hypervisor itself.

The st field is present in the iostat, vmstat, sar and top commands.

However, this thread confirms the steal field is not supported in VmWare VMs (I tested it in VMware 5.5 and I corroborate it). VirtualBox doesn't provide CPU steal time data also. It is supported by Xen and KVM virtual environments.

vmstat also has the same field in the CPU area, but only after Debian 8. For sar to work sysstat data collection has to be enabled.

As per man vmstat:

st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown.

Further reading: CPU Time stolen from a virtual machine?

It’s the time the hypervisor scheduled something else to run instead of something within your VM. This might be time for another VM, or for the Hypervisor host itself. If no time were stolen, this time would be used to run your CPU workload or your idle thread.

Related Solutions

Linux – Understanding ‘top’ Command CPU Usage Metrics

hi is the time spent processing hardware interrupts. Hardware interrupts are generated by hardware devices (network cards, keyboard controller, external timer, hardware sensors, ...) when they need to signal something to the CPU (data has arrived, for example).

Since these can happen very frequently, and since they essentially block the current CPU while they are running, kernel hardware interrupt handlers are written to be as fast and simple as possible.

If long or complex processing needs to be done, these tasks are deferred using a mechanism call softirqs. These are scheduled independently, can run on any CPU, can even run concurrently (none of that is true of hardware interrupt handlers).

_{The part about hard IRQs blocking the current CPU, and the part about softirqs being able to run anywhere are not exactly correct, there can be limitations, and some hard IRQs can interrupt others.}

As an example, a "data received" hardware interrupt from a network card could simply store the information "card ethX needs to be serviced" somewhere and schedule a softirq. The softirq would be the thing that triggers the actual packet routing.

si represents the time spent in these softirqs.

A good read about the softirq mechanism (with a bit of history too) is Matthew Wilcox's I'll Do It Later: Softirqs, Tasklets, Bottom Halves, Task Queues, Work Queues and Timers (PDF, 64k).

st, "steal time", is only relevant in virtualized environments. It represents time when the real CPU was not available to the current virtual machine — it was "stolen" from that VM by the hypervisor (either to run another VM, or for its own needs).

The CPU time accounting document from IBM has more information about steal time, and CPU accounting in virtualized environments. (It's aimed at zSeries type hardware, but the general idea is the same for most platforms.)

Linux – how does blktrace work

It appears that there are additional undocumented flags in the RWBS field and that the 'B' for barrier is deprecated. N denotes anything that is not discard, read or write.

D - discard
W - write
R - read
N - None of the above 

F - FUA
A - readahead
S - sync
M - metadata



static inline void fill_rwbs(char *rwbs, struct blk_io_trace *t)
{
    int w = t->action & BLK_TC_ACT(BLK_TC_WRITE);
    int a = t->action & BLK_TC_ACT(BLK_TC_AHEAD);
    int s = t->action & BLK_TC_ACT(BLK_TC_SYNC);
    int m = t->action & BLK_TC_ACT(BLK_TC_META);
    int d = t->action & BLK_TC_ACT(BLK_TC_DISCARD);
    int f = t->action & BLK_TC_ACT(BLK_TC_FLUSH);
    int u = t->action & BLK_TC_ACT(BLK_TC_FUA);
    int i = 0;

    if (f)
            rwbs[i++] = 'F'; /* flush */

    if (d)
            rwbs[i++] = 'D';
    else if (w)
            rwbs[i++] = 'W';
    else if (t->bytes)
            rwbs[i++] = 'R';
    else
            rwbs[i++] = 'N';

    if (u)
            rwbs[i++] = 'F'; /* fua */
    if (a)
            rwbs[i++] = 'A';
    if (s)
            rwbs[i++] = 'S';
    if (m)
            rwbs[i++] = 'M';

    rwbs[i] = '\0';
}

Best Answer

Related Solutions

Linux – Understanding ‘top’ Command CPU Usage Metrics

Linux – how does blktrace work

Related Question