Standalone printf
Part of the "expense" in invoking a process is that several things have to happen that are resource intensive.
- The executable has to be loaded from the disk, this incurs slowness since the HDD has be be accessed in order to load the binary blob from the disk which the executable is stored as.
- The executable is typically built using dynamic libraries, so some secondary files to the executable will also have to be loaded, (i.e. more binary blob data being read from the HDD).
- Operating system overhead. Each process that you invoke incurs overhead in the form of a process ID having to be created for it. Space in memory will also have be carved out to both house the binary data being loaded from the HDD in steps 1 & 2, as well as multiple structures having to be populated to store things such as the processes' environment (environment variables etc.)
excerpt of an strace of /usr/bin/printf
$ strace /usr/bin/printf "%s\n" "hello world"
*execve("/usr/bin/printf", ["/usr/bin/printf", "%s\\n", "hello world"], [/* 91 vars */]) = 0
brk(0) = 0xe91000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd155a6b000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=242452, ...}) = 0
mmap(NULL, 242452, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fd155a2f000
close(3) = 0
open("/lib64/libc.so.6", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p\357!\3474\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1956608, ...}) = 0
mmap(0x34e7200000, 3781816, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x34e7200000
mprotect(0x34e7391000, 2097152, PROT_NONE) = 0
mmap(0x34e7591000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x191000) = 0x34e7591000
mmap(0x34e7596000, 21688, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x34e7596000
close(3) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd155a2e000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd155a2c000
arch_prctl(ARCH_SET_FS, 0x7fd155a2c720) = 0
mprotect(0x34e7591000, 16384, PROT_READ) = 0
mprotect(0x34e701e000, 4096, PROT_READ) = 0
munmap(0x7fd155a2f000, 242452) = 0
brk(0) = 0xe91000
brk(0xeb2000) = 0xeb2000
brk(0) = 0xeb2000
open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=99158752, ...}) = 0
mmap(NULL, 99158752, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fd14fb9b000
close(3) = 0
fstat(1, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd155a6a000
write(1, "hello world\n", 12hello world
) = 12
close(1) = 0
munmap(0x7fd155a6a000, 4096) = 0
close(2) = 0
exit_group(0) = ?*
Looking through the above you can get a sense of the additional resources that /usr/bin/printf
is having to incur due to it being a standalone executable.
Builtin printf
With the built version of printf
all the libraries that it depends on as well as its binary blob have already been loaded into memory when Bash was invoked. So none of that has to be incurred again.
Effectively when you call the builtin "commands" to Bash, you're really making what amounts to a function call, since everything has already been loaded.
An analogy
If you've ever worked with a programming language, such as Perl, it's equivalent to making calls to the function (system("mycmd")
) or using the backticks (`mycmd`
). When you do either of those things, you're forking a separate process with it's own overhead, vs. using the functions that are offered to you through Perl's core functions.
Anatomy of Linux Process Management
There's a pretty good article on IBM Developerworks that breaks down the various aspects of how Linux processes are created and destroyed along with the different C libraries involved in the process. The article is titled:Anatomy of Linux process management - Creation, management, scheduling, and destruction. It's also available as a PDF.
SHELL SEQ:
Probably a useful means of bench-marking a shell's performance is to do a lot of very small, simple evaluations repetitively. It is important, I think, not just to loop, but to loop over input, because a shell needs to read <&0
.
I thought this would complement the tests @cuonglm already posted because it demonstrates a single shell process's performance once invoked, as opposed to his which demonstrates how quickly a shell process loads when invoked. In this way, between us, we cover both sides of the coin.
Here's a function to facilitate the demo:
sh_bench() ( #dont copy+paste comments
o=-c sh=$(command -v "$1") ; shift #get shell $PATH; toss $1
[ -z "${sh##*busybox}" ] && o='ash -c' #cause its weird
set -- "$sh" $o "'$(cat <&3)'" -- "$@" #$@ = invoke $shell
time env - "$sh" $o "while echo; do echo; done|$*" #time (env - sh|sh) AC/DC
) 3<<-\SCRIPT
#Everything from here down is run by the different shells
i="${2:-1}" l="${1:-100}" d="${3:-
}"; set -- "\$((n=\$n\${n:++\$i}))\$d" #prep loop; prep eval
set -- $1$1$1$1$1$1$1$1$1$1 #yup
while read m #iterate on input
do [ $(($i*50+${n:=-$i})) -gt "$(($l-$i))" ] || #eval ok?
eval echo -n \""$1$1$1$1$1"\" #yay!
[ $((n=$i+$n)) -gt "$(($l-$i))" ] && #end game?
echo "$n" && exit #and EXIT
echo -n "$n$d" #damn - maybe next time
done #done
#END
SCRIPT #end heredoc
It either increments a variable once per newline read or, as a slight-optimization, if it can, it increments 50 times per newline read. Every time the variable is incremented it is printed to stdout
. It behaves a lot like a sort of seq
cross nl
.
And just to make it very clear what it does - here's some truncated set -x;
output after inserting it just before time
in the function above:
time env - /usr/bin/busybox ash -c '
while echo; do echo; done |
/usr/bin/busybox ash -c '"'$(
cat <&3
)'"' -- 20 5 busybox'
So each shell is first called like:
env - $shell -c "while echo; do echo; done |..."
...to generate the input that it will need to loop over when it reads in 3<<\SCRIPT
- or when cat
does, anyway. And on the other side of that |pipe
it calls itself again like:
"...| $shell -c '$(cat <<\SCRIPT)' -- $args"
So aside from the initial call to env
(because cat
is actually called in the previous line); no other processes are invoked from the time it is called until it exits. At least, I hope that's true.
Before the numbers...
I should make some notes on portability.
posh
doesn't like $((n=n+1))
and insists on $((n=$n+1))
mksh
doesn't have a printf
builtin in most cases. Earlier tests had it lagging a great deal - it was invoking /usr/bin/printf
for every run. Hence the echo -n
above.
maybe more as I remember it...
Anyway, to the numbers:
for sh in dash busybox posh ksh mksh zsh bash
do sh_bench $sh 20 5 $sh 2>/dev/null
sh_bench $sh 500000 | wc -l
echo ; done
That'll get 'em all in one go...
0dash5dash10dash15dash20
real 0m0.909s
user 0m0.897s
sys 0m0.070s
500001
0busybox5busybox10busybox15busybox20
real 0m1.809s
user 0m1.787s
sys 0m0.107s
500001
0posh5posh10posh15posh20
real 0m2.010s
user 0m2.060s
sys 0m0.067s
500001
0ksh5ksh10ksh15ksh20
real 0m2.019s
user 0m1.970s
sys 0m0.047s
500001
0mksh5mksh10mksh15mksh20
real 0m2.287s
user 0m2.340s
sys 0m0.073s
500001
0zsh5zsh10zsh15zsh20
real 0m2.648s
user 0m2.223s
sys 0m0.423s
500001
0bash5bash10bash15bash20
real 0m3.966s
user 0m3.907s
sys 0m0.213s
500001
ARBITRARY = MAYBE OK?
Still, this is a rather arbitrary test, but it does test reading input, arithmetic evaluation, and variable expansion. Maybe not comprehensive, but possibly near to there.
EDIT by Teresa e Junior: @mikeserv and I have done many other tests (see our chat for details), and we found the results could be summarized like this:
- If you need speed, go definitely with dash, it is much faster than any other shell and about 4x faster than bash.
- While busybox's shell can be much slower than dash, in some tests it could be faster, because it has many of its own userland utilities, like
grep
, sed
, sort
, etc., which don't have as many features as the commonly used GNU utilities, but can get the work done as much.
- If speed is not everything you care about, ksh (or ksh93) can be considered the best compromisse between speed and features. It's speed compares to the smaller mksh, which is way faster than bash, and it has also some unique features, like floating point arithmetic.
- Although bash is famous for its simplicity, stability, and functionality, it was the slowest of all shells in the majority of our tests, and by a large margin.
Best Answer
Prompted by @josten, I ran a comparison on the two. The code is on GitHub. In short1:
The User+Sys time taken by
cmp -s
seemed to be a tad more than that ofdiff
in most cases. However, the Real time take was pretty much arbitrary -cmp
ahead on some,diff
ahead on some.Summary:
Any difference in performance is pure coincidence. Use whatever you wish.
1The images are 1920x450, so do open them in a tab to see them in their full glory.