Tools for Monitoring Steal Time (st)

topvirtualizationxen

We're running on a virtual "dedicated" server, which should, in theory, mean that we're the only guys on the server. In practice…. I'm thinking we might not be.

enter image description here

Notice that although it looks like we're killing our machine, "Steal time" is at 71%

I'm taking statistics on load and I was disappointed that this stat didn't show up in my graphs. Are there any tools which monitor this which might be able to help?


Additional information:

We're running 4 cores, model:

# grep "model name" /proc/cpuinfo | sort -u
model name  : Intel(R) Core(TM)2 Duo CPU     E7500  @ 2.93GHz

Best Answer

You're question is well defined, but you're not giving a lot of information about your environment, how you're currently monitoring or what graphing tools you're using. However, given that SNMP is used pretty much universally for that I'll assume that you're using it and have at least some familiarity with it.

Although (as near as I can tell) the CPU Steal time isn't currently available from snmpd, you can extend it yourself with the UCD-SNMP-MIB::extOutput object and exec commands.

The easiest way (that I've found) to get the steal time is from iostat. Using the following construct we can get just the steal time:

$ iostat -c | awk 'NR==4 {print $5}'
0.00

Therefore, append the following to your snmpd.conf:

exec cpu_steal_time /usr/bin/iostat -c | /usr/bin/awk 'NR==4 {print $5}'

(Alternatively you can put the command in a wrapper script and call the wrapper from inside snmpd.conf.)

Each exec call in snmpd.conf is indexed starting from 1. So if you only have a single exec statement then you'll want to poll UCD-SNMP-MIB::extOutput.1. If this is the 5th exec statement then poll UCD-SNMP-MIB::extOutput.5, etc.

The numeric OID for UCD-SNMP-MIB::extOutput is .1.3.6.1.4.1.2021.8.1.101 so if you're at index 1 it would be .1.3.6.1.4.1.2021.8.1.101.1, and index 5 would be .1.3.6.1.4.1.2021.8.1.101.5, etc.

You then create a graph polling that SNMPD OID of type gauge, ranging from 0–100. This should give you some pretty graphs.

Related Question