In a bash
script I need various values from /proc/
files. Until now I have dozens of lines grepping the files directly like that:
grep -oP '^MemFree: *\K[0-9]+' /proc/meminfo
In an effort to make that more efficient I saved the file content in a variable and grepped that:
a=$(</proc/meminfo)
echo "$a" | grep -oP '^MemFree: *\K[0-9]+'
Instead of opening the file multiple times this should just open it once and grep the variable content, which I assumed would be faster – but in fact it is slower:
bash 4.4.19 $ time for i in {1..1000};do grep ^MemFree /proc/meminfo;done >/dev/null
real 0m0.803s
user 0m0.619s
sys 0m0.232s
bash 4.4.19 $ a=$(</proc/meminfo)
bash 4.4.19 $ time for i in {1..1000};do echo "$a"|grep ^MemFree; done >/dev/null
real 0m1.182s
user 0m1.425s
sys 0m0.506s
The same is true for dash
and zsh
. I suspected the special state of /proc/
files as a reason, but when I copy the content of /proc/meminfo
to a regular file and use that the results are the same:
bash 4.4.19 $ cat </proc/meminfo >meminfo
bash 4.4.19 $ time for i in $(seq 1 1000);do grep ^MemFree meminfo; done >/dev/null
real 0m0.790s
user 0m0.608s
sys 0m0.227s
Using a here string to save the pipe makes it slightly faster, but still not as fast as with the files:
bash 4.4.19 $ time for i in $(seq 1 1000);do <<<"$a" grep ^MemFree; done >/dev/null
real 0m0.977s
user 0m0.758s
sys 0m0.268s
Why is opening a file faster than reading the same content from a variable?
Best Answer
Here, it's not about opening a file versus reading a variable's content but more about forking an extra process or not.
grep -oP '^MemFree: *\K[0-9]+' /proc/meminfo
forks a process that executesgrep
that opens/proc/meminfo
(a virtual file, in memory, no disk I/O involved) reads it and matches the regexp.The most expensive part in that is forking the process and loading the grep utility and its library dependencies, doing the dynamic linking, open the locale database, dozens of files that are on disk (but likely cached in memory).
The part about reading
/proc/meminfo
is insignificant in comparison, the kernel needs little time to generate the information in there andgrep
needs little time to read it.If you run
strace -c
on that, you'll see the oneopen()
and oneread()
systems calls used to read/proc/meminfo
is peanuts compared to everything elsegrep
does to start (strace -c
doesn't count the forking).In:
In most shells that support that
$(<...)
ksh operator, the shell just opens the file and read its content (and strips the trailing newline characters).bash
is different and much less efficient in that it forks a process to do that reading and passes the data to the parent via a pipe. But here, it's done once so it doesn't matter.In:
The shell needs to spawn two processes, which are running concurrently but interact between each other via a pipe. That pipe creation, tearing down, and writing and reading from it has some little cost. The much greater cost is the spawning of an extra process. The scheduling of the processes has some impact as well.
You may find that using the zsh
<<<
operator makes it slightly quicker:In zsh and bash, that's done by writing the content of
$a
in a temporary file, that is less expensive than spawning an extra process, but will probably not give you any gain compared to getting the data straight off/proc/meminfo
. That's still less efficient than your approach that copies/proc/meminfo
on disk, as the writing of the temp file is done at each iteration.dash
doesn't support here-strings, but its heredocs are implemented with a pipe that doesn't involve spawning an extra process. In:The shell creates a pipe, forks a process. The child executes
grep
with its stdin as the reading end of the pipe, and the parent writes the content at the other end of the pipe.But that pipe handling and process synchronisation is still likely to be more expensive than just getting the data straight off
/proc/meminfo
.The content of
/proc/meminfo
is short and takes not much time to produce. If you want to save some CPU cycles, you want to remove the expensive parts: forking processes and running external commands.Like:
Avoid
bash
though whose pattern matching is very ineficient. Withzsh -o extendedglob
, you can shorten it to:Note that
^
is special in many shells (Bourne, fish, rc, es and zsh with the extendedglob option at least), I'd recommend quoting it. Also note thatecho
can't be used to output arbitrary data (hence my use ofprintf
above).