I ran a shell pipeline under perf stat
, using taskset 0x1
to pin the whole pipeline to a single CPU. I know taskset 0x1
had an effect, because it more than doubled the throughput of the pipeline. However, perf stat
shows 0 context switches between the different processes of the pipeline.
So what exactly does perf stat
mean by context switches?
I think I was interested in the number of context switches to/from the individual tasks in the pipeline. Is there a better way to measure that?
This was in the context of comparing dd bs=1M </dev/zero
, to dd bs=1M </dev/zero | dd bs=1M >/dev/null
. If I can measure context switches as desired, I assume that it would be useful in quantifying why the first version is several times more "efficient" than the second.
$ rpm -q perf
perf-4.15.0-300.fc27.x86_64
$ uname -r
4.15.17-300.fc27.x86_64
$ perf stat taskset 0x1 sh -c 'dd bs=1M </dev/zero | dd bs=1M >/dev/null'
^C18366+0 records in
18366+0 records out
19258146816 bytes (19 GB, 18 GiB) copied, 5.0566 s, 3.8 GB/s
Performance counter stats for 'taskset 0x1 sh -c dd if=/dev/zero bs=1M | dd bs=1M of=/dev/null':
5059.273255 task-clock:u (msec) # 1.000 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
414 page-faults:u # 0.082 K/sec
36,915,934 cycles:u # 0.007 GHz
9,511,905 instructions:u # 0.26 insn per cycle
2,480,746 branches:u # 0.490 M/sec
188,295 branch-misses:u # 7.59% of all branches
5.061473119 seconds time elapsed
$ perf stat sh -c 'dd bs=1M </dev/zero | dd bs=1M >/dev/null'
^C6637+0 records in
6636+0 records out
6958350336 bytes (7.0 GB, 6.5 GiB) copied, 4.04907 s, 1.7 GB/s
6636+0 records in
6636+0 records out
6958350336 bytes (7.0 GB, 6.5 GiB) copied, 4.0492 s, 1.7 GB/s
sh: Interrupt
Performance counter stats for 'sh -c dd if=/dev/zero bs=1M | dd bs=1M of=/dev/null':
3560.269345 task-clock:u (msec) # 0.878 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
355 page-faults:u # 0.100 K/sec
32,302,387 cycles:u # 0.009 GHz
4,823,855 instructions:u # 0.15 insn per cycle
1,167,126 branches:u # 0.328 M/sec
88,982 branch-misses:u # 7.62% of all branches
4.052844128 seconds time elapsed
Best Answer
perf was silently failing to count context switches because you were not root.
(Linux has 64k pipe buffers. In either case, you can see very close to 2 context switches per 64k transferred. Not exactly sure how that works, but I suspect it's only counting context switches away from
dd
, either to the otherdd
, or to the idle task for that cpu).