Linux – Why is reading a FILE faster than reading a VARIABLE

bashlinuxperformanceshell

I don't understand the results of a simple performance test I ran using two basic scripts (running on a high end server):

perfVar.zsh :

#!/bin/zsh -f

MYVAR=`cat $1`
for i in {1..10}
do
  echo $MYVAR
done

perfCat.zsh

#!/bin/zsh -f

for i in {1..10}
do
cat $1
done

Performance test result:

> time ./perfVar.zsh BigTextFile > /dev/null
./perfVar.zsh FE > /dev/null  6.86s user 0.32s system 100% cpu 7.177 total
> time ./perfCat.zsh BigTextFile > /dev/null
./perfCat.zsh FE > /dev/null  0.01s user 0.10s system 91% cpu 0.118 total

I would have thought that accessing a VARIABLE was way faster than reading a FILE on the file system… Why this result ?
Is there a way to optimize the perfCat.zsh script by reducing the number of accesses to the file system ?

Best Answer

I was able to reproduce the same behavior in Bash. The main problem here is that you're using shell variables in a way that they weren't designed for; and therefore not optimized for. When you do 'echo $HUGEVAR', the shell has to build a command line containing the entire contents of $HUGEVAR (even though 'echo' is a built-in command, there's still a command line).

So the shell expands HUGEVAR into a large string which is then parsed again to split it on whitespace into a list of individual arguments to the echo command. (Note that this will have the effect of collapsing consecutive whitespace characters in the input file to single space characters). Clearly, this process is not very efficient with large strings.

You should just use the method of 'cat bigfile' multiple times; and allow the OS's file system cache to do its job and speed up the repeated access of the big file; you avoid the subtle (possibly unwanted) modification to the string that the shell does when you use echo (plus the 'cat' method will work with binary files where the shell method could break on binary data).

Related Question