Bash – Why does echo >file use more real time than echo | sed >file

bashechopipeshelltime

The example, below, surprised me. It seems to be counter intuitive… aside from the fact that there is a whisker more user time for the echo | sed combo.

Why is echo using so much sys time when it runs alone, or should the question be, How does sed change the state of play? It seems that echo would needs to do the same echo-ing in both cases…

time echo -n a\ {1..1000000}\ c$'\n' >file

# real    0m9.481s
# user    0m5.304s
# sys     0m4.172s

time echo -n a\ {1..1000000}\ c$'\n' |sed s/^\ // >file

# real    0m5.955s
# user    0m5.488s
# sys     0m1.580s

Best Answer

bahamat and Alan Curry have it right: this is due to the way your shell buffers the output of echo. Specifically, your shell is bash, and it issues one write system call per line. Hence the first snippet makes 1000000 writes to a disk file, whereas the second snippet makes 1000000 writes to a pipe and sed (largely in parallel, if you have multiple CPUs) makes a considerably smaller number of writes to a disk file due to its output buffering.

You can observe what's going on by running strace.

$ strace -f -e write bash -c 'echo -n a\ {1..2}\ c$'\'\\n\'' >file'
write(1, "a 1 c\n", 6)                  = 6
write(1, " a 2 c\n", 7)                 = 7
$ strace -f -e write bash -c 'echo -n a\ {1..2}\ c$'\'\\n\'' | sed "s/^ //" >file'
Process 28052 attached
Process 28053 attached
Process 28051 suspended
[pid 28052] write(1, "a 1 c\n", 6)      = 6
[pid 28052] write(1, " a 2 c\n", 7)     = 7
Process 28051 resumed
Process 28052 detached
Process 28051 suspended
[pid 28053] write(1, "a 1 c\na 2 c\n", 12) = 12
Process 28051 resumed
Process 28053 detached
--- SIGCHLD (Child exited) @ 0 (0) ---

Other shells such as ksh buffer the output of echo even when it's multiline, so you won't see much of a difference.

$ strace -f -e write ksh -c 'echo -n a\ {1..2}\ c$'\'\\n\'' >file'
write(1, "a 1 c\n a 2 c\n", 13)         = 13
$ strace -f -e write ksh -c 'echo -n a\ {1..2}\ c$'\'\\n\'' | sed "s/^ //" >file'
Process 28058 attached
[pid 28058] write(1, "a 1 c\n a 2 c\n", 13) = 13
Process 28058 detached
--- SIGCHLD (Child exited) @ 0 (0) ---
write(1, "a 1 c\na 2 c\n", 12)          = 12

With bash I get similar timing ratios. With ksh I see the second snippet running slower.

ksh$ time echo -n a\ {1..1000000}\ c$'\n' >file

real    0m1.44s
user    0m1.28s
sys     0m0.06s
ksh$ time echo -n a\ {1..1000000}\ c$'\n' | sed "s/^ //" >file

real    0m2.38s
user    0m1.52s
sys     0m0.14s

Related Solutions

Bash vs GNU Time – Why Bash Time Is More Precise

After some hardcore bash code examining I found out that bash time uses getrusage() and GNU time uses times(). getrusage() is far more precise because of microsecond resolution.

Why real time can be lower than user time

The output you show is a bit odd, since real time would usually be bigger than the other two.

Real time is wall clock time. (what we could measure with a stopwatch)
User time is the amount of time spend in user-mode within the process
Sys is the CPU time spend in the kernel within the process.

So I suppose if the work was done by several processors concurrently, the CPU time would be higher than the elapsed wall clock time.

Was this a concurrent/multi-threaded/parallel type of application?

Just as an example, this is what I get on my Linux system when I issue the time find . command. As expected the elapsed real time is much larger than the others on this single user/single core process.

real    0m5.231s
user    0m0.072s
sys     0m0.088s

The rule of thumb is:

real < user: The process is CPU bound and takes advantage of parallel execution on multiple cores/CPUs.
real ≈ user: The process is CPU bound and takes no advantage of parallel exeuction.
real > user: The process is I/O bound. Execution on multiple cores would be of little to no advantage.

Best Answer

Related Solutions

Bash vs GNU Time – Why Bash Time Is More Precise

Why real time can be lower than user time

Related Question