Standalone printf
Part of the "expense" in invoking a process is that several things have to happen that are resource intensive.
- The executable has to be loaded from the disk, this incurs slowness since the HDD has be be accessed in order to load the binary blob from the disk which the executable is stored as.
- The executable is typically built using dynamic libraries, so some secondary files to the executable will also have to be loaded, (i.e. more binary blob data being read from the HDD).
- Operating system overhead. Each process that you invoke incurs overhead in the form of a process ID having to be created for it. Space in memory will also have be carved out to both house the binary data being loaded from the HDD in steps 1 & 2, as well as multiple structures having to be populated to store things such as the processes' environment (environment variables etc.)
excerpt of an strace of /usr/bin/printf
$ strace /usr/bin/printf "%s\n" "hello world"
*execve("/usr/bin/printf", ["/usr/bin/printf", "%s\\n", "hello world"], [/* 91 vars */]) = 0
brk(0) = 0xe91000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd155a6b000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=242452, ...}) = 0
mmap(NULL, 242452, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fd155a2f000
close(3) = 0
open("/lib64/libc.so.6", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p\357!\3474\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1956608, ...}) = 0
mmap(0x34e7200000, 3781816, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x34e7200000
mprotect(0x34e7391000, 2097152, PROT_NONE) = 0
mmap(0x34e7591000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x191000) = 0x34e7591000
mmap(0x34e7596000, 21688, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x34e7596000
close(3) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd155a2e000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd155a2c000
arch_prctl(ARCH_SET_FS, 0x7fd155a2c720) = 0
mprotect(0x34e7591000, 16384, PROT_READ) = 0
mprotect(0x34e701e000, 4096, PROT_READ) = 0
munmap(0x7fd155a2f000, 242452) = 0
brk(0) = 0xe91000
brk(0xeb2000) = 0xeb2000
brk(0) = 0xeb2000
open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=99158752, ...}) = 0
mmap(NULL, 99158752, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fd14fb9b000
close(3) = 0
fstat(1, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd155a6a000
write(1, "hello world\n", 12hello world
) = 12
close(1) = 0
munmap(0x7fd155a6a000, 4096) = 0
close(2) = 0
exit_group(0) = ?*
Looking through the above you can get a sense of the additional resources that /usr/bin/printf
is having to incur due to it being a standalone executable.
Builtin printf
With the built version of printf
all the libraries that it depends on as well as its binary blob have already been loaded into memory when Bash was invoked. So none of that has to be incurred again.
Effectively when you call the builtin "commands" to Bash, you're really making what amounts to a function call, since everything has already been loaded.
An analogy
If you've ever worked with a programming language, such as Perl, it's equivalent to making calls to the function (system("mycmd")
) or using the backticks (`mycmd`
). When you do either of those things, you're forking a separate process with it's own overhead, vs. using the functions that are offered to you through Perl's core functions.
Anatomy of Linux Process Management
There's a pretty good article on IBM Developerworks that breaks down the various aspects of how Linux processes are created and destroyed along with the different C libraries involved in the process. The article is titled:Anatomy of Linux process management - Creation, management, scheduling, and destruction. It's also available as a PDF.
From the fine manual for bash(1)
:
ARGUMENTS
If arguments remain after option processing, and neither the -c nor the
-s option has been supplied, the first argument is assumed to be the
name of a file containing shell commands.
Does ls
contain shell commands? No, it is a binary file. bash
squawks about this fact and fails.
A strace
may help show what is going on:
$ strace -o alog bash ls
/usr/bin/ls: /usr/bin/ls: cannot execute binary file
The alog
file can get a bit messy, but shows bash
looking for ls
in the current working directory—a security risk if someone has placed a naughty ls
file somewhere!—and then does a PATH
search:
$ grep ls alog
execve("/usr/bin/bash", ["bash", "ls"], [/* 43 vars */]) = 0
open("ls", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/ls", 0x7fff349810f0) = -1 ENOENT (No such file or directory)
stat("/usr/bin/ls", {st_mode=S_IFREG|0755, st_size=117672, ...}) = 0
stat("/usr/bin/ls", {st_mode=S_IFREG|0755, st_size=117672, ...}) = 0
access("/usr/bin/ls", X_OK) = 0
stat("/usr/bin/ls", {st_mode=S_IFREG|0755, st_size=117672, ...}) = 0
access("/usr/bin/ls", R_OK) = 0
stat("/usr/bin/ls", {st_mode=S_IFREG|0755, st_size=117672, ...}) = 0
stat("/usr/bin/ls", {st_mode=S_IFREG|0755, st_size=117672, ...}) = 0
access("/usr/bin/ls", X_OK) = 0
stat("/usr/bin/ls", {st_mode=S_IFREG|0755, st_size=117672, ...}) = 0
access("/usr/bin/ls", R_OK) = 0
open("/usr/bin/ls", O_RDONLY) = 3
As to why this could be a security risk, if you run bash somecmd
from the wrong directory where someone has created a ls
(or some other known command due to a bug in a script):
$ echo "echo rm -rf /" > ls
$ bash ls
rm -rf /
$
Best Answer
read
is a shell builtin, i.e. a command that is provided by the shell itself rather than by an external program. For more information about shell builtins, see What is the difference between a builtin command and one that is not?read
needs to be a builtin because it modifies the state of the shell, specifically it sets variables containing the output. It's impossible for an external command to set variables of the shell that calls them. See also Why is cd not a program?.Some systems also have an external command called
read
, for debatable compliance reasons. The external command can't do all the job of the command: it can read a line of input, but it can't set shell variables to what it read, so the external command can only be used to discard a line of input, not to process it.which read
doesn't tell you that a builtin exists because that's not its job.which
itself is an external command in bash and other Bourne-style shells (excluding zsh), so it only reports information about external commands. There's very rarely any good reason to callwhich
. The command to find out what a command name stands for istype
.