After some googling, I found a way to compile BASH scripts to binary executables (using shc
).
I know that shell is an interpreted language, but what does this compiler do? Will it improve the performance of my script in any way?
compilercompilingperformanceshell
After some googling, I found a way to compile BASH scripts to binary executables (using shc
).
I know that shell is an interpreted language, but what does this compiler do? Will it improve the performance of my script in any way?
Linux ignores the setuid¹ bit on all interpreted executables (i.e. executables starting with a #!
line). The comp.unix.questions FAQ explains the security problems with setuid shell scripts. These problems are of two kinds: shebang-related and shell-related; I go into more details below.
If you don't care about security and want to allow setuid scripts, under Linux, you'll need to patch the kernel. As of 3.x kernels, I think you need to add a call to install_exec_creds
in the load_script
function, before the call to open_exec
, but I haven't tested.
There is a race condition inherent to the way shebang (#!
) is typically implemented:
#!
.argv[1]
), and executes the interpreter.If setuid scripts are allowed with this implementation, an attacker can invoke an arbitrary script by creating a symbolic link to an existing setuid script, executing it, and arranging to change the link after the kernel has performed step 1 and before the interpreter gets around to opening its first argument. For this reason, most unices ignore the setuid bit when they detect a shebang.
One way to secure this implementation would be for the kernel to lock the script file until the interpreter has opened it (note that this must prevent not only unlinking or overwriting the file, but also renaming any directory in the path). But unix systems tend to shy away from mandatory locks, and symbolic links would make a correct lock feature especially difficult and invasive. I don't think anyone does it this way.
A few unix systems (mainly OpenBSD, NetBSD and Mac OS X, all of which require a kernel setting to be enabled) implement secure setuid shebang using an additional feature: the path /dev/fd/N
refers to the file already opened on file descriptor N (so opening /dev/fd/N
is roughly equivalent to dup(N)
). Many unix systems (including Linux) have /dev/fd
but not setuid scripts.
#!
. Let's say the file descriptor for the executable is 3./dev/fd/3
the argument list (as argv[1]
), and executes the interpreter.Sven Mascheck's shebang page has a lot of information on shebang across unices, including setuid support.
Let's assume you've managed to make your program run as root, either because your OS supports setuid shebang or because you've used a native binary wrapper (such as sudo
). Have you opened a security hole? Maybe. The issue here is not about interpreted vs compiled programs. The issue is whether your runtime system behaves safely if executed with privileges.
Any dynamically linked native binary executable is in a way interpreted by the dynamic loader (e.g. /lib/ld.so
), which loads the dynamic libraries required by the program. On many unices, you can configure the search path for dynamic libraries through the environment (LD_LIBRARY_PATH
is a common name for the environment variable), and even load additional libraries into all executed binaries (LD_PRELOAD
). The invoker of the program can execute arbitrary code in that program's context by placing a specially-crafted libc.so
in $LD_LIBRARY_PATH
(amongst other tactics). All sane systems ignore the LD_*
variables in setuid executables.
In shells such as sh, csh and derivatives, environment variables automatically become shell parameters. Through parameters such as PATH
, IFS
, and many more, the invoker of the script has many opportunities to execute arbitrary code in the shell scripts's context. Some shells set these variables to sane defaults if they detect that the script has been invoked with privileges, but I don't know that there is any particular implementation that I would trust.
Most runtime environments (whether native, bytecode or interpreted) have similar features. Few take special precautions in setuid executables, though the ones that run native code often don't do anything fancier than dynamic linking (which does take precautions).
Perl is a notable exception. It explicitly supports setuid scripts in a secure way. In fact, your script can run setuid even if your OS ignored the setuid bit on scripts. This is because perl ships with a setuid root helper that performs the necessary checks and reinvokes the interpreter on the desired scripts with the desired privileges. This is explained in the perlsec manual. It used to be that setuid perl scripts needed #!/usr/bin/suidperl -wT
instead of #!/usr/bin/perl -wT
, but on most modern systems, #!/usr/bin/perl -wT
is sufficient.
Note that using a native binary wrapper does nothing in itself to prevent these problems. In fact, it can make the situation worse, because it might prevent your runtime environment from detecting that it is invoked with privileges and bypassing its runtime configurability.
A native binary wrapper can make a shell script safe if the wrapper sanitizes the environment. The script must take care not to make too many assumptions (e.g. about the current directory) but this goes. You can use sudo for this provided that it's set up to sanitize the environment. Blacklisting variables is error-prone, so always whitelist. With sudo, make sure that the env_reset
option is turned on, that setenv
is off, and that env_file
and env_keep
only contain innocuous variables.
TL,DR:
env_reset
option).¹ This discussion applies equally if you substitute “setgid” for “setuid”; they are both ignored by the Linux kernel on scripts
for sh in bash zsh yash dash mksh ksh
do printf "\n%s:\t" "$sh"
time "$sh" -c '
str="some string"
set "" ""
while ${20001+"break"}
do set "$@$@";done
IFS=A; printf %.100000s\\n "$str$*$*$*$*$*"'|
wc -c
done
bash: 100001
"$sh" -c 0.15s user 0.01s system 94% cpu 0.176 total
wc -c 0.00s user 0.00s system 1% cpu 0.175 total
zsh: 100001
"$sh" -c 0.03s user 0.01s system 97% cpu 0.034 total
wc -c 0.00s user 0.00s system 9% cpu 0.034 total
yash: 100001
"$sh" -c 0.06s user 0.01s system 94% cpu 0.067 total
wc -c 0.00s user 0.00s system 5% cpu 0.067 total
dash: 100001
"$sh" -c 0.02s user 0.01s system 92% cpu 0.029 total
wc -c 0.00s user 0.00s system 11% cpu 0.028 total
ksh: 100001
"$sh" -c 0.02s user 0.00s system 96% cpu 0.021 total
wc -c 0.00s user 0.00s system 16% cpu 0.021 total
So this benches the various shells set to $sh
in the for
loop on how quickly they can generate a string of 100,000 characters. The first 11 of those 100,000 chars are some string
as is first set to the value of $str
, but the tail fill is 999,989 A
chars.
The shells get the A
chars in $*
which substitutes in the first character in the value of the special shell parameter $IFS
as a concatenation delimiter between every positional parameter in shell's argument array. Because all of the arguments are ""
null, the only chars in $*
are the delimiter chars.
The arguments are accrued at an exponential rate for each iteration of the while
loop - which only break
s when the $20001
parameter has finally been ${set+}
. Until then, basically the while
loop does:
### first iteration
while $unset_param; do set "" """" ""; done
### second iteration
while $unset_param; do set "" "" """" "" ""; done
### third iteration
while $unset_param; do set "" "" "" "" """" "" "" "" ""; done
...and so on.
After the while
loop completes $IFS
is set to A
and the special shell parameter $*
is concatenated five times to the tail of $str
. printf
trims the resulting %s
tring argument to a maximum of .100000
bytes before writing it out to its stdout.
One might use the same strategy like:
str='some string'
set "" ""
while ${51+"break"}; do set "$@$@"; done
shift "$((${#}-(51-${#str}))"
...which results in a total argument count of 40 - and so 39 delimiters...
IFS=.; printf %s\\n "$str$*"
some string.......................................
And you can reuse the same arguments you've already set w/ a different $IFS
for a different fill:
for IFS in a b c; do printf %s\\n "$str$*"; done
some stringaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
some stringbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
some stringccccccccccccccccccccccccccccccccccccccc
You can also fill in the null arguments with a printf
format string rather than using $IFS
:
printf "%s m%sy%1ss%st%sr%si%sn%sg" "$str$@"
some string my string my string my string my string my string
Best Answer
To answer the question in your title, compiled shell scripts could be better for performance — if the result of the compilation represented the result of the interpretation, without having to re-interpret the commands in the script over and over. See for instance
ksh93
'sshcomp
orzsh
'szcompile
.However,
shc
doesn’t compile scripts in this way. It’s not really a compiler, it’s a script “encryption” tool with various protection techniques of dubious effectiveness. When you compile a script withshc
, the result is a binary whose contents aren’t immediately readable; when it runs, it decrypts its contents, and runs the tool the script was intended for with the decrypted script, making the original script easy to retrieve (it’s passed in its entirety on the interpreter’s command line, with extra spacing in an attempt to make it harder to find). So the overall performance will always be worse: on top of the time taken to run the original script, there’s the time taken to set the environment up and decrypt the script.