Bash – How to use associative arrays safely inside arithmetic expressions

associative arraybashkshSecurityzsh

A few Bourne-like shells support associative arrays: ksh93 (since 1993), zsh (since 1998), bash (since 2009), though with some differences in behaviour between the 3.

A common use is for counting occurrences of some strings.

However, I find that things like:

typeset -A count
(( count[$var]++ ))

Don't work for some values of $var and I hear it even constitutes an arbitrary command execution vulnerability if the contents of $var is or may be under the control of an attacker.

Why is that? What are the problematic values? And how do I work around it?

Best Answer

The problem is that in a shell arithmetic expression, such as inside $((...)) (POSIX), or ((...)) (ksh/bash/zsh), or array indices or arguments of some shell builtins or [[...]] operands, word expansions (${param}, $((...)), $[...], $(...), `...`, ${ ...; }) are performed first, and then the resulting text is interpreted as an arithmetic expression.

In the case of $((...)), that's even a POSIX requirement.

That allows things like op=+; echo "$(( 1 $op 2 ))" to work, and that explains why a=1+1; echo "$(($a * 2))" outputs 3 instead of 4, as it's the 1+1 * 2 expression that is evaluated.

That's also partly why using unsanitised data in arithmetic expressions is a security vulnerability in general.

What is easy to overlook is that it also applies in things like

(( assoc[$var]++ ))

Above, except in ksh93, $var is expanded first, and the result interpreted.

That means that if $var contains @ or *, then the assoc[@]++ or assoc[*]++ expressions are evaluated, and @/* have special meanings there. If $var is x] + 2 + assoc[y, that becomes assoc[x] + 2 + assoc[y].

Now normally, in $(( $var )), even if $var contains something like $(reboot), there is no second round of expansion happening, reboot is not going to be run. But as already seen at Security Implications of using unsanitized data in Shell Arithmetic evaluation, there's an exception if that appears inside word[...] to allow recursive expansion. At the root of the problem is an unfortunate feature of the Korn shell whereby if var contains an arithmetic expression, then in $((var)) the arithmetic expression in $var is being evaluated, even recursively (like when var2='var3 + 1' var='var2 + 1'), something allowed, but not required by POSIX.

As that's extended to array members, that means that the contents of array indexes end up being evaluated recursively. So, if $var is $(reboot), then (( assoc[$var]++ )) ends up calling reboot.

ksh93 seems to have some level of work around to it, but only when $var doesn't contain $ it seems. So, while ksh93 is OK with var=']', var='@', or var='`reboot`', it's not with $(reboot).

As an example, if we replace reboot with the harmless uname>&2:

$ var='1$(uname>&2)' ksh -c 'typeset -A a; (( a[$var]++ )); typeset -p a'
Linux
typeset -A a=([1]=1)
$ var='1$(uname>&2)' bash -c 'typeset -A a; (( a[$var]++ )); typeset -p a'
Linux
Linux
declare -A a=([1]="1" )
$ var='1$(uname>&2)' zsh -c 'typeset -A a; (( a[$var]++ )); typeset -p a'
Linux
Linux
typeset -A a=( [1]=1 )

The uname command does end up being run (twice in bash and zsh, I suppose once for getting the current value and the second time to perform the assignment).

In version 5.0, bash added a assoc_expand_once option which changes the behaviour:

$ var='1$(uname>&2)' bash -O assoc_expand_once -c 'typeset -A a; ((a[$var]++)); typeset -p a'
declare -A a=(["1\$(uname>&2)"]="1" )

is now OK, but it doesn't address the problems with @, *, or ] characters, so it doesn't address the arbitrary command execution vulnerability:

$ var='x]+b[1$(uname>&2)' bash -O assoc_expand_once -c 'typeset -A a; ((a[$var]++)); typeset -p a'
Linux
declare -A a

(this time, uname is being run as part of the evaluation of a plain array (b) index evaluation).

The list of problematic characters varies with the shell. $ is a problem for all three, \, `, [ and ] are a problem for bash and zsh, ", ' for bash. Same for the @ and * and empty values. Also note that in some locales, the encoding of some characters do contain that of \, [ or ] at least and could cause problems. How to escape those must be done differently in all three shells.

To work around it, one can do:

assoc[$var]=$(( ${assoc[$var]} + 1 ))

instead. That is:

  1. do not perform an assignment to an associative array member as part of the arithmetic expression but only perform bare associative array member assignments. In other words, do not use the =, ++, --, +=, /=... arithmetic operators with an associative array member as the target.
  2. when referencing an associative array within an arithmetic expression, do not use assoc[$var], but ${assoc[$var]} (or $assoc[$var] in zsh), or (${assoc[$var]}) if that's meant to contain an arithmetic expression instead of just a number.

But, as always the value of that associative array member must be under your control, preferably a plain number, and like for any other parameter expansion, it's preferable to put whitespace around then. For instance ((1 - $var)) is preferable to ((1-$var)) as the latter would cause problem for negative values (((1--1)) causes a syntax error in some shells as that's the -- operator applied to 1.

Another caveat is that when $var is empty, in (( 1 + var )), that var is still a token in the arithmetic expression syntax, and the corresponding value if 0. But in (( 1 + $var )), the arithmetic expression becomes 1 + which is a syntax error ((( $var + 1 )) is OK though, as that becomes + 1, invoking the unary + operator).

Other approaches with bash (when the assoc_expand_once option is not enabled) or zsh (but not ksh93 which still has a problem with ] and \ characters), are to delay the expansion until that second, recursive interpretation mentioned above.

  • (( assoc[\$var]++ ))
  • let 'assoc[$var]++' (make sure to use single quotes here)
  • incr='assoc[$var]++'; (($incr)) (or even ((incr)))
  • ((' assoc[$var]++ ')) or (( assoc['$var']++ )) (bash only).

Those have the advantage or preserving the exit status resulting from the arithmetic evaluation (success if non-zero), so one can do things like:

if (( assoc[\$var]++ )); then
  printf '%s\n' "$var was already seen"
fi

Now, that leaves one problem specific to the bash shell: bash associative arrays don't support empty keys. While assoc[]=x fails in both bash and zsh (not ksh93), assoc[$var] when $var is empty works in zsh or ksh93 but not bash. Even zsh's assoc+=('' value) now supported by bash-5.1 doesn't work in bash.

So if working with bash specifically and if an empty key is one of the possible values, the only option is to add a fixed prefix/suffix. So use for instance:

assoc[.$var]=$(( ${assoc[.$var]} + 1 ))

Or:

let 'assoc[.$var]++'
(( assoc[.\$var]++ ))
...
Related Question