Shell – string fill performance in shell scripts: what is the best amount and for what shell

performanceshellstring

I was trying to determine the best performance for string fill like in:
str+="A" #one per loop

I came with this script for bash:

#!/bin/bash
bReport=false
nLimit=${1-3000}; #up to 25000

echo "nLimit='$nLimit'"
shopt -s expand_aliases
nStop=100000;fMaxWorkTime=1.0;
alias GetTime='date +"%s.%N"';
nTimeBegin="`GetTime`";
nDelayPart="`GetTime`";
strFinal="";
str="";
fPartWorkSleep="`bc <<< "scale=10;($fMaxWorkTime/$nStop)*$nLimit"`"
echo "fPartWorkSleep='$fPartWorkSleep'"
nCount=0;
while true;do 
    str+="A";
    ((nCount++))&&:;
    if(((nCount%nLimit)==0)) || ((nCount==nStop));then 
        strFinal+="$str";
        str="";
        if $bReport;then
            echo "`bc <<< "$(GetTime)-$nDelayPart"` #${#strFinal} #`bc <<< "$(GetTime)-$nTimeBegin"`";
            nDelayPart="`GetTime`";
        fi
        sleep $fPartWorkSleep # like doing some weigthy thing based on the amount of data processed
    fi;
    if((nCount==nStop));then 
        break;
    fi;
done;
echo "strFinal size ${#strFinal}"
echo "took `bc <<< "$(GetTime)-$nTimeBegin"`"

And in bash the best performance/size is when str is limited from 3000 to 25000 characters (on my machine). After each part is filled, it must be emptied and some weigthy action can be performed with str value (and the weight is relative to its size).

So my question is, what shell has the best string fill performance? based on what I exposed. I am willing to use other shell than bash, just for this kind of algorithm, it if proves to be faster.

PS.: I had to use nCount as checks on string size degraded performance.

Best Answer

for sh  in bash zsh yash dash mksh ksh
do      printf  "\n%s:\t" "$sh"
        time    "$sh" -c '
                        str="some string"
                        set     "" ""
                        while   ${20001+"break"}
                        do      set "$@$@";done
                        IFS=A;  printf %.100000s\\n "$str$*$*$*$*$*"'|
                wc -c
done

bash:   100001
"$sh" -c   0.15s user 0.01s system 94% cpu 0.176 total
wc -c  0.00s user 0.00s system 1% cpu 0.175 total

zsh:    100001
"$sh" -c   0.03s user 0.01s system 97% cpu 0.034 total
wc -c  0.00s user 0.00s system 9% cpu 0.034 total

yash:   100001
"$sh" -c   0.06s user 0.01s system 94% cpu 0.067 total
wc -c  0.00s user 0.00s system 5% cpu 0.067 total

dash:   100001
"$sh" -c   0.02s user 0.01s system 92% cpu 0.029 total
wc -c  0.00s user 0.00s system 11% cpu 0.028 total

ksh:    100001
"$sh" -c   0.02s user 0.00s system 96% cpu 0.021 total
wc -c  0.00s user 0.00s system 16% cpu 0.021 total

So this benches the various shells set to $sh in the for loop on how quickly they can generate a string of 100,000 characters. The first 11 of those 100,000 chars are some string as is first set to the value of $str, but the tail fill is 999,989 A chars.

The shells get the A chars in $* which substitutes in the first character in the value of the special shell parameter $IFS as a concatenation delimiter between every positional parameter in shell's argument array. Because all of the arguments are "" null, the only chars in $* are the delimiter chars.

The arguments are accrued at an exponential rate for each iteration of the while loop - which only breaks when the $20001 parameter has finally been ${set+}. Until then, basically the while loop does:

### first iteration
while $unset_param; do set "" """" ""; done
### second iteration
while $unset_param; do set "" "" """" "" ""; done
### third iteration
while $unset_param; do set "" "" "" "" """" "" "" "" ""; done

...and so on.

After the while loop completes $IFS is set to A and the special shell parameter $* is concatenated five times to the tail of $str. printf trims the resulting %string argument to a maximum of .100000 bytes before writing it out to its stdout.

One might use the same strategy like:

str='some string'
set "" ""
while ${51+"break"}; do set "$@$@"; done
shift "$((${#}-(51-${#str}))"

...which results in a total argument count of 40 - and so 39 delimiters...

IFS=.; printf %s\\n "$str$*"

some string.......................................

And you can reuse the same arguments you've already set w/ a different $IFS for a different fill:

for IFS in a b c; do printf %s\\n "$str$*"; done

some stringaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
some stringbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
some stringccccccccccccccccccccccccccccccccccccccc

You can also fill in the null arguments with a printf format string rather than using $IFS:

printf "%s m%sy%1ss%st%sr%si%sn%sg" "$str$@"

some string my string my string my string my string my string
Related Question