Shell Script – Are Compiled Shell Scripts Better for Performance?

compilercompilingperformanceshell

After some googling, I found a way to compile BASH scripts to binary executables (using shc).

I know that shell is an interpreted language, but what does this compiler do? Will it improve the performance of my script in any way?

Best Answer

To answer the question in your title, compiled shell scripts could be better for performance — if the result of the compilation represented the result of the interpretation, without having to re-interpret the commands in the script over and over. See for instance ksh93's shcomp or zsh's zcompile.

However, shc doesn’t compile scripts in this way. It’s not really a compiler, it’s a script “encryption” tool with various protection techniques of dubious effectiveness. When you compile a script with shc, the result is a binary whose contents aren’t immediately readable; when it runs, it decrypts its contents, and runs the tool the script was intended for with the decrypted script, making the original script easy to retrieve (it’s passed in its entirety on the interpreter’s command line, with extra spacing in an attempt to make it harder to find). So the overall performance will always be worse: on top of the time taken to run the original script, there’s the time taken to set the environment up and decrypt the script.

Setuid shebang

There is a race condition inherent to the way shebang (#!) is typically implemented:

The kernel opens the executable, and finds that it starts with #!.
The kernel closes the executable and opens the interpreter instead.
The kernel inserts the path to the script to the argument list (as argv[1]), and executes the interpreter.

If setuid scripts are allowed with this implementation, an attacker can invoke an arbitrary script by creating a symbolic link to an existing setuid script, executing it, and arranging to change the link after the kernel has performed step 1 and before the interpreter gets around to opening its first argument. For this reason, most unices ignore the setuid bit when they detect a shebang.

One way to secure this implementation would be for the kernel to lock the script file until the interpreter has opened it (note that this must prevent not only unlinking or overwriting the file, but also renaming any directory in the path). But unix systems tend to shy away from mandatory locks, and symbolic links would make a correct lock feature especially difficult and invasive. I don't think anyone does it this way.

A few unix systems (mainly OpenBSD, NetBSD and Mac OS X, all of which require a kernel setting to be enabled) implement secure setuid shebang using an additional feature: the path /dev/fd/N refers to the file already opened on file descriptor N (so opening /dev/fd/N is roughly equivalent to dup(N)). Many unix systems (including Linux) have /dev/fd but not setuid scripts.

The kernel opens the executable, and finds that it starts with #!. Let's say the file descriptor for the executable is 3.
The kernel opens the interpreter.
The kernel inserts /dev/fd/3 the argument list (as argv[1]), and executes the interpreter.

Sven Mascheck's shebang page has a lot of information on shebang across unices, including setuid support.

Setuid interpreters

Let's assume you've managed to make your program run as root, either because your OS supports setuid shebang or because you've used a native binary wrapper (such as sudo). Have you opened a security hole? Maybe. The issue here is not about interpreted vs compiled programs. The issue is whether your runtime system behaves safely if executed with privileges.

Any dynamically linked native binary executable is in a way interpreted by the dynamic loader (e.g. /lib/ld.so), which loads the dynamic libraries required by the program. On many unices, you can configure the search path for dynamic libraries through the environment (LD_LIBRARY_PATH is a common name for the environment variable), and even load additional libraries into all executed binaries (LD_PRELOAD). The invoker of the program can execute arbitrary code in that program's context by placing a specially-crafted libc.so in $LD_LIBRARY_PATH (amongst other tactics). All sane systems ignore the LD_* variables in setuid executables.
In shells such as sh, csh and derivatives, environment variables automatically become shell parameters. Through parameters such as PATH, IFS, and many more, the invoker of the script has many opportunities to execute arbitrary code in the shell scripts's context. Some shells set these variables to sane defaults if they detect that the script has been invoked with privileges, but I don't know that there is any particular implementation that I would trust.
Most runtime environments (whether native, bytecode or interpreted) have similar features. Few take special precautions in setuid executables, though the ones that run native code often don't do anything fancier than dynamic linking (which does take precautions).
Perl is a notable exception. It explicitly supports setuid scripts in a secure way. In fact, your script can run setuid even if your OS ignored the setuid bit on scripts. This is because perl ships with a setuid root helper that performs the necessary checks and reinvokes the interpreter on the desired scripts with the desired privileges. This is explained in the perlsec manual. It used to be that setuid perl scripts needed #!/usr/bin/suidperl -wT instead of #!/usr/bin/perl -wT, but on most modern systems, #!/usr/bin/perl -wT is sufficient.

Note that using a native binary wrapper does nothing in itself to prevent these problems. In fact, it can make the situation worse, because it might prevent your runtime environment from detecting that it is invoked with privileges and bypassing its runtime configurability.

A native binary wrapper can make a shell script safe if the wrapper sanitizes the environment. The script must take care not to make too many assumptions (e.g. about the current directory) but this goes. You can use sudo for this provided that it's set up to sanitize the environment. Blacklisting variables is error-prone, so always whitelist. With sudo, make sure that the env_reset option is turned on, that setenv is off, and that env_file and env_keep only contain innocuous variables.

TL,DR:

Setuid shebang is insecure but usually ignored.
If you run a program with privileges (either through sudo or setuid), write native code or perl, or start the program with a wrapper that sanitizes the environment (such as sudo with the env_reset option).

¹ _{This discussion applies equally if you substitute “setgid” for “setuid”; they are both ignored by the Linux kernel on scripts}

Shell – string fill performance in shell scripts: what is the best amount and for what shell

for sh  in bash zsh yash dash mksh ksh
do      printf  "\n%s:\t" "$sh"
        time    "$sh" -c '
                        str="some string"
                        set     "" ""
                        while   ${20001+"break"}
                        do      set "$@$@";done
                        IFS=A;  printf %.100000s\\n "$str$*$*$*$*$*"'|
                wc -c
done

bash:   100001
"$sh" -c   0.15s user 0.01s system 94% cpu 0.176 total
wc -c  0.00s user 0.00s system 1% cpu 0.175 total

zsh:    100001
"$sh" -c   0.03s user 0.01s system 97% cpu 0.034 total
wc -c  0.00s user 0.00s system 9% cpu 0.034 total

yash:   100001
"$sh" -c   0.06s user 0.01s system 94% cpu 0.067 total
wc -c  0.00s user 0.00s system 5% cpu 0.067 total

dash:   100001
"$sh" -c   0.02s user 0.01s system 92% cpu 0.029 total
wc -c  0.00s user 0.00s system 11% cpu 0.028 total

ksh:    100001
"$sh" -c   0.02s user 0.00s system 96% cpu 0.021 total
wc -c  0.00s user 0.00s system 16% cpu 0.021 total

So this benches the various shells set to $sh in the for loop on how quickly they can generate a string of 100,000 characters. The first 11 of those 100,000 chars are some string as is first set to the value of $str, but the tail fill is 999,989 A chars.

The shells get the A chars in $* which substitutes in the first character in the value of the special shell parameter $IFS as a concatenation delimiter between every positional parameter in shell's argument array. Because all of the arguments are "" null, the only chars in $* are the delimiter chars.

The arguments are accrued at an exponential rate for each iteration of the while loop - which only breaks when the $20001 parameter has finally been ${set+}. Until then, basically the while loop does:

### first iteration
while $unset_param; do set "" """" ""; done
### second iteration
while $unset_param; do set "" "" """" "" ""; done
### third iteration
while $unset_param; do set "" "" "" "" """" "" "" "" ""; done

...and so on.

After the while loop completes $IFS is set to A and the special shell parameter $* is concatenated five times to the tail of $str. printf trims the resulting %string argument to a maximum of .100000 bytes before writing it out to its stdout.

One might use the same strategy like:

str='some string'
set "" ""
while ${51+"break"}; do set "$@$@"; done
shift "$((${#}-(51-${#str}))"

...which results in a total argument count of 40 - and so 39 delimiters...

IFS=.; printf %s\\n "$str$*"

some string.......................................

And you can reuse the same arguments you've already set w/ a different $IFS for a different fill:

for IFS in a b c; do printf %s\\n "$str$*"; done

some stringaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
some stringbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
some stringccccccccccccccccccccccccccccccccccccccc

You can also fill in the null arguments with a printf format string rather than using $IFS:

printf "%s m%sy%1ss%st%sr%si%sn%sg" "$str$@"

some string my string my string my string my string my string

Best Answer

Related Solutions

Shell – Allow setuid on shell scripts

Setuid shebang

Setuid interpreters

Shell – string fill performance in shell scripts: what is the best amount and for what shell

Related Question