Bash – Why is bash’s printf faster than /usr/bin/printf

bashperformanceshell-builtin

I have two ways of calling printf on my system:

$ type -a printf
printf is a shell builtin
printf is /usr/bin/printf
$ file /usr/bin/printf
/usr/bin/printf: ELF 64-bit LSB  executable, x86-64, version 1 (SYSV), dynamically
linked (uses shared libs), for GNU/Linux 2.6.32,
BuildID[sha1]=d663d220e5c2a2fc57462668d84d2f72d0563c33, stripped

So, one is a bash builtin and the other is a proper compiled executable. I would have expected a program whose only job is printf to be much faster than the shell builtin. Granted, the builtin is already loaded into memory but the actual execution time should be faster in a dedicated program right? It would be optimized to do one thing very well in the best of Unix philosophy.

Apparently not:

$ >/tmp/foo; time for i in `seq 1 3000`; do printf '%s ' "$i" >> /tmp/foo; done;
real    0m0.065s
user    0m0.036s
sys     0m0.024s

$ >/tmp/foo; time for i in `seq 1 3000`; do /usr/bin/printf '%s ' "$i" >> /tmp/foo; done;   
real    0m18.097s
user    0m1.048s
sys     0m7.124s

A lot of this, as @Guru points out is because of the cost of creating threads which is only incurred by /usr/bin/printf. If that were all, I would expect the executable to be faster than the builtin if run outside a loop. Unfortunately, /usr/bin/printf has a limit to the size of a variable that it can take, so I could only test this with a relatively short string:

$ i=$(seq 1 28000 | awk '{k=k$1}END{print k}'); time /usr/bin/printf '%s ' "$i" > /dev/null; 

real    0m0.035s
user    0m0.004s
sys     0m0.028s

$ i=$(seq 1 28000 | awk '{k=k$1}END{print k}'); time printf '%s ' "$i" > /dev/null; 

real    0m0.008s
user    0m0.008s
sys     0m0.000s

The builtin is still consistently and significantly faster. To make it even clearer, lets make both start new processes:

$ time for i in `seq 1 1000`; do /usr/bin/printf '%s ' "$i" >/dev/null; done;   
real    0m33.695s
user    0m0.636s
sys     0m30.628s

$ time for i in `seq 1 1000`; do bash -c "printf '%s ' $i" >/dev/null; done;   

real    0m3.557s
user    0m0.380s
sys     0m0.508s

The only reason I can think of is that the variable being printed is internal to bash and can be passed directly to the builtin. Is that enough to explain the difference in speed? What other factors are at play?

Best Answer

Standalone printf

Part of the "expense" in invoking a process is that several things have to happen that are resource intensive.

The executable has to be loaded from the disk, this incurs slowness since the HDD has be be accessed in order to load the binary blob from the disk which the executable is stored as.
The executable is typically built using dynamic libraries, so some secondary files to the executable will also have to be loaded, (i.e. more binary blob data being read from the HDD).
Operating system overhead. Each process that you invoke incurs overhead in the form of a process ID having to be created for it. Space in memory will also have be carved out to both house the binary data being loaded from the HDD in steps 1 & 2, as well as multiple structures having to be populated to store things such as the processes' environment (environment variables etc.)

excerpt of an strace of /usr/bin/printf

    $ strace /usr/bin/printf "%s\n" "hello world"
    *execve("/usr/bin/printf", ["/usr/bin/printf", "%s\\n", "hello world"], [/* 91 vars */]) = 0
    brk(0)                                  = 0xe91000
    mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd155a6b000
    access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
    open("/etc/ld.so.cache", O_RDONLY)      = 3
    fstat(3, {st_mode=S_IFREG|0644, st_size=242452, ...}) = 0
    mmap(NULL, 242452, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fd155a2f000
    close(3)                                = 0
    open("/lib64/libc.so.6", O_RDONLY)      = 3
    read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p\357!\3474\0\0\0"..., 832) = 832
    fstat(3, {st_mode=S_IFREG|0755, st_size=1956608, ...}) = 0
    mmap(0x34e7200000, 3781816, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x34e7200000
    mprotect(0x34e7391000, 2097152, PROT_NONE) = 0
    mmap(0x34e7591000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x191000) = 0x34e7591000
    mmap(0x34e7596000, 21688, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x34e7596000
    close(3)                                = 0
    mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd155a2e000
    mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd155a2c000
    arch_prctl(ARCH_SET_FS, 0x7fd155a2c720) = 0
    mprotect(0x34e7591000, 16384, PROT_READ) = 0
    mprotect(0x34e701e000, 4096, PROT_READ) = 0
    munmap(0x7fd155a2f000, 242452)          = 0
    brk(0)                                  = 0xe91000
    brk(0xeb2000)                           = 0xeb2000
    brk(0)                                  = 0xeb2000
    open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
    fstat(3, {st_mode=S_IFREG|0644, st_size=99158752, ...}) = 0
    mmap(NULL, 99158752, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fd14fb9b000
    close(3)                                = 0
    fstat(1, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
    mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd155a6a000
    write(1, "hello world\n", 12hello world
    )           = 12
    close(1)                                = 0
    munmap(0x7fd155a6a000, 4096)            = 0
    close(2)                                = 0
    exit_group(0)                           = ?*

Looking through the above you can get a sense of the additional resources that /usr/bin/printf is having to incur due to it being a standalone executable.

Builtin printf

With the built version of printf all the libraries that it depends on as well as its binary blob have already been loaded into memory when Bash was invoked. So none of that has to be incurred again.

Effectively when you call the builtin "commands" to Bash, you're really making what amounts to a function call, since everything has already been loaded.

An analogy

If you've ever worked with a programming language, such as Perl, it's equivalent to making calls to the function (system("mycmd")) or using the backticks (`mycmd`). When you do either of those things, you're forking a separate process with it's own overhead, vs. using the functions that are offered to you through Perl's core functions.

Anatomy of Linux Process Management

There's a pretty good article on IBM Developerworks that breaks down the various aspects of how Linux processes are created and destroyed along with the different C libraries involved in the process. The article is titled:Anatomy of Linux process management - Creation, management, scheduling, and destruction. It's also available as a PDF.

Related Solutions

Bash – Why is deleting files by name painfully slow and also exceptionally fast

rm -r is expected to be slow as its recursive. A depth first traversal has to be made on the directory structure.

Now how did you create 10 million files ? did u use some script which loops on some order ? 1.txt,2.txt,3.txt... if yes then those files may too be allocated on same order in contigous blocks in hdd.so deleting on same order will be faster.

"ls -f" will enable -aU which lists in directory order which is again recursive.

Bash Shell Builtin – Why Parameters Are Optional

Bash built-ins are inconsistent and poorly documented.

Here's an example:

$ help command
command: command [-pVv] command [arg ...]
    Runs COMMAND with ARGS ignoring shell functions.  If you have a shell
    function called 'ls', and you wish to call the command `ls', you can
    say "command ls".  If the -p option is given, a default value is used
    for PATH that is guaranteed to find all of the standard utilities.  If
    the -V or -v option is given, a string is printed describing COMMAND.
    The -V option produces a more verbose description.
$ command; echo $?
0

Even without command the return code $? -eq 0 and there is no error on std err.

Another one:

$ help disown
disown: disown [-h] [-ar] [jobspec ...]
    By default, removes each JOBSPEC argument from the table of active jobs.
    If the -h option is given, the job is not removed from the table, but is
    marked so that SIGHUP is not sent to the job if the shell receives a
    SIGHUP.  The -a option, when JOBSPEC is not supplied, means to remove all
    jobs from the job table; the -r option means to remove only running jobs.
$ disown; echo $?
-bash: disown: current: no such job
1

All the arguments are optional but it returns $? -eq 1 when there are none.

I've even compiled the newest Bash 4.2 and here are my results:

$ help command
command: command [-pVv] command [arg ...]
    Execute a simple command or display information about commands.

    Runs COMMAND with ARGS suppressing  shell function lookup, or display
    information about the specified COMMANDs.  Can be used to invoke commands
    on disk when a function with the same name exists.

    Options:
      -p    use a default value for PATH that is guaranteed to find all of
        the standard utilities
      -v    print a description of COMMAND similar to the `type' builtin
      -V    print a more verbose description of each COMMAND

    Exit Status:
    Returns exit status of COMMAND, or failure if COMMAND is not found.
$ command; echo $?
0

There's a new section "Exit Status" and command is still an optional argument. Even worse than 3.x. The same for other built-ins.

So, you're right. Bash built-ins are a mess and should be fixed.