Bash – How to use associative arrays safely inside arithmetic expressions

associative arraybashkshSecurityzsh

A few Bourne-like shells support associative arrays: ksh93 (since 1993), zsh (since 1998), bash (since 2009), though with some differences in behaviour between the 3.

A common use is for counting occurrences of some strings.

However, I find that things like:

typeset -A count
(( count[$var]++ ))

Don't work for some values of $var and I hear it even constitutes an arbitrary command execution vulnerability if the contents of $var is or may be under the control of an attacker.

Why is that? What are the problematic values? And how do I work around it?

Best Answer

The problem is that in a shell arithmetic expression, such as inside $((...)) (POSIX), or ((...)) (ksh/bash/zsh), or array indices or arguments of some shell builtins or [[...]] operands, word expansions (${param}, $((...)), $[...], $(...), `...`, ${ ...; }) are performed first, and then the resulting text is interpreted as an arithmetic expression.

In the case of $((...)), that's even a POSIX requirement.

That allows things like op=+; echo "$(( 1 $op 2 ))" to work, and that explains why a=1+1; echo "$(($a * 2))" outputs 3 instead of 4, as it's the 1+1 * 2 expression that is evaluated.

That's also partly why using unsanitised data in arithmetic expressions is a security vulnerability in general.

What is easy to overlook is that it also applies in things like

(( assoc[$var]++ ))

Above, except in ksh93, $var is expanded first, and the result interpreted.

That means that if $var contains @ or *, then the assoc[@]++ or assoc[*]++ expressions are evaluated, and @/* have special meanings there. If $var is x] + 2 + assoc[y, that becomes assoc[x] + 2 + assoc[y].

Now normally, in $(( $var )), even if $var contains something like $(reboot), there is no second round of expansion happening, reboot is not going to be run. But as already seen at Security Implications of using unsanitized data in Shell Arithmetic evaluation, there's an exception if that appears inside word[...] to allow recursive expansion. At the root of the problem is an unfortunate feature of the Korn shell whereby if var contains an arithmetic expression, then in $((var)) the arithmetic expression in $var is being evaluated, even recursively (like when var2='var3 + 1' var='var2 + 1'), something allowed, but not required by POSIX.

As that's extended to array members, that means that the contents of array indexes end up being evaluated recursively. So, if $var is $(reboot), then (( assoc[$var]++ )) ends up calling reboot.

ksh93 seems to have some level of work around to it, but only when $var doesn't contain $ it seems. So, while ksh93 is OK with var=']', var='@', or var='`reboot`', it's not with $(reboot).

As an example, if we replace reboot with the harmless uname>&2:

$ var='1$(uname>&2)' ksh -c 'typeset -A a; (( a[$var]++ )); typeset -p a'
Linux
typeset -A a=([1]=1)
$ var='1$(uname>&2)' bash -c 'typeset -A a; (( a[$var]++ )); typeset -p a'
Linux
Linux
declare -A a=([1]="1" )
$ var='1$(uname>&2)' zsh -c 'typeset -A a; (( a[$var]++ )); typeset -p a'
Linux
Linux
typeset -A a=( [1]=1 )

The uname command does end up being run (twice in bash and zsh, I suppose once for getting the current value and the second time to perform the assignment).

In version 5.0, bash added a assoc_expand_once option which changes the behaviour:

$ var='1$(uname>&2)' bash -O assoc_expand_once -c 'typeset -A a; ((a[$var]++)); typeset -p a'
declare -A a=(["1\$(uname>&2)"]="1" )

is now OK, but it doesn't address the problems with @, *, or ] characters, so it doesn't address the arbitrary command execution vulnerability:

$ var='x]+b[1$(uname>&2)' bash -O assoc_expand_once -c 'typeset -A a; ((a[$var]++)); typeset -p a'
Linux
declare -A a

(this time, uname is being run as part of the evaluation of a plain array (b) index evaluation).

The list of problematic characters varies with the shell. $ is a problem for all three, \, `, [ and ] are a problem for bash and zsh, ", ' for bash. Same for the @ and * and empty values. Also note that in some locales, the encoding of some characters do contain that of \, [ or ] at least and could cause problems. How to escape those must be done differently in all three shells.

To work around it, one can do:

assoc[$var]=$(( ${assoc[$var]} + 1 ))

instead. That is:

do not perform an assignment to an associative array member as part of the arithmetic expression but only perform bare associative array member assignments. In other words, do not use the =, ++, --, +=, /=... arithmetic operators with an associative array member as the target.
when referencing an associative array within an arithmetic expression, do not use assoc[$var], but ${assoc[$var]} (or $assoc[$var] in zsh), or (${assoc[$var]}) if that's meant to contain an arithmetic expression instead of just a number.

But, as always the value of that associative array member must be under your control, preferably a plain number, and like for any other parameter expansion, it's preferable to put whitespace around then. For instance ((1 - $var)) is preferable to ((1-$var)) as the latter would cause problem for negative values (((1--1)) causes a syntax error in some shells as that's the -- operator applied to 1.

Another caveat is that when $var is empty, in (( 1 + var )), that var is still a token in the arithmetic expression syntax, and the corresponding value if 0. But in (( 1 + $var )), the arithmetic expression becomes 1 + which is a syntax error ((( $var + 1 )) is OK though, as that becomes + 1, invoking the unary + operator).

Other approaches with bash (when the assoc_expand_once option is not enabled) or zsh (but not ksh93 which still has a problem with ] and \ characters), are to delay the expansion until that second, recursive interpretation mentioned above.

(( assoc[\$var]++ ))
let 'assoc[$var]++' (make sure to use single quotes here)
incr='assoc[$var]++'; (($incr)) (or even ((incr)))
((' assoc[$var]++ ')) or (( assoc['$var']++ )) (bash only).

Those have the advantage or preserving the exit status resulting from the arithmetic evaluation (success if non-zero), so one can do things like:

if (( assoc[\$var]++ )); then
  printf '%s\n' "$var was already seen"
fi

Now, that leaves one problem specific to the bash shell: bash associative arrays don't support empty keys. While assoc[]=x fails in both bash and zsh (not ksh93), assoc[$var] when $var is empty works in zsh or ksh93 but not bash. Even zsh's assoc+=('' value) now supported by bash-5.1 doesn't work in bash.

So if working with bash specifically and if an empty key is one of the possible values, the only option is to add a fixed prefix/suffix. So use for instance:

assoc[.$var]=$(( ${assoc[.$var]} + 1 ))

Or:

let 'assoc[.$var]++'
(( assoc[.\$var]++ ))
...

Shells with associative arrays

Some modern shells provide associative arrays: ksh93, bash ≥4, zsh. In ksh93 and bash, if a is an associative array, then "${!a[@]}" is the array of its keys:

for k in "${!a[@]}"; do
  echo "$k -> ${a[$k]}"
done

In zsh, that syntax only works in ksh emulation mode. Otherwise you have to use zsh's native syntax:

for k in "${(@k)a}"; do
  echo "$k -> $a[$k]"
done

${(k)a} also works if a does not have an empty key.

In zsh, you could also loop on both keys and values at the same time:

for k v ("${(@kv)a}") echo "$k -> $v"

Shells without associative arrays

Emulating associative arrays in shells that don't have them is a lot more work. If you need associative arrays, it's probably time to bring in a bigger tool, such as ksh93 or Perl.

If you do need associative arrays in a mere POSIX shell, here's a way to simulate them, when keys are restricted to contain only the characters 0-9A-Z_a-z (ASCII digits, letters and underscore). Under this assumption, keys can be used as part of variable names. The functions below act on an array identified by a naming prefix, the “stem”, which must not contain two consecutive underscores.

## ainit STEM
## Declare an empty associative array named STEM.
ainit () {
  eval "__aa__${1}=' '"
}
## akeys STEM
## List the keys in the associatve array named STEM.
akeys () {
  eval "echo \"\$__aa__${1}\""
}
## aget STEM KEY VAR
## Set VAR to the value of KEY in the associative array named STEM.
## If KEY is not present, unset VAR.
aget () {
  eval "unset $3
        case \$__aa__${1} in
          *\" $2 \"*) $3=\$__aa__${1}__$2;;
        esac"
}
## aset STEM KEY VALUE
## Set KEY to VALUE in the associative array named STEM.
aset () {
  eval "__aa__${1}__${2}=\$3
        case \$__aa__${1} in
          *\" $2 \"*) :;;
          *) __aa__${1}=\"\${__aa__${1}}$2 \";;
        esac"
}
## aunset STEM KEY
## Remove KEY from the associative array named STEM.
aunset () {
  eval "unset __aa__${1}__${2}
        case \$__aa__${1} in
          *\" $2 \"*) __aa__${1}=\"\${__aa__${1}%%* $2 } \${__aa__${1}#* $2 }\";;
        esac"
}

(Warning, untested code. Error detection for syntactically invalid stems and keys is not provided.)

Bash – Is “Arithmetic Expansion” the expected action on vars inside `[[` tests

From man ksh:

An arithmetic expression uses the same syntax, precedence, and associativity of expression as the C language. All the C language operators that apply to floating point quantities can be used... Variables can be referenced by name within an arithmetic expression without using the parameter expansion syntax. When a variable is referenced, its value is evaluated as an arithmetic expression...

A conditional expression is used with the [[ compound command to test attributes of files and to compare strings. Field splitting and file name generation are not performed on the words between [[ and ]]. Each expression can be constructed from one or more of the following unary or binary expressions...

The following obsolete arithmetic comparisons are also permitted:

exp1-eqexp2

True, if exp1 is equal to exp2.

exp1-neexp2

True, if exp1 is not equal to exp2.

exp1-ltexp2

True, if exp1 is less than exp2.

exp1-gtexp2

True, if exp1 is greater than exp2.

exp1-leexp2

True, if exp1 is less than or equal to exp2.

exp1-geexp2

True, if exp1 is greater than or equal to exp2.

The documentation there is consistent where references to arithmetic expressions are concerned, and (apparently carefully) avoids any self-contradictions surrounding the definition of the [[ compound command ]] pertaining to string comparison by explicitly also permitting some obsolete arithmetic comparisons in the same context.

From man bash:

[[expression]]

Return a status of 0 or 1 depending on the evaluation of the condi‐ tional expression expression. Expressions are composed of the primaries described below... Word splitting and pathname expansion are not performed on the words between the [[ and ]]; ~tilde expansion, ${parameter} and $variable expansion, $((arithmetic expansion)), $(command substitution), <(process substitution), and "\'quote removal are performed. Conditional operators such as -f must be unquoted to be recognized as primaries...

A variable may be assigned to by a statement of the form:

name=[value]

If value is not given, the variable is assigned the null string. All values undergo ~tilde expansion, ${parameter} and $variable expansion, $(command substitution), $((arithmetic expansion)), and "\'quote removal... If the variable has its integer attribute set, then value is evaluated as an $((arithmetic expression)) even if the $((...)) expansion is not used...

The shell allows arithmetic expressions to be evaluated, under certain circumstances... Evaluation is done in fixed-width integers with no check for overflow, though division by 0 is trapped and flagged as an error. The operators and their precedence, associativity, and values are the same as in the C language.

Shell variables are allowed as operands; parameter expansion is performed before the expression is evaluated. Within an expression, shell variables may also be referenced by name without using the parameter expansion syntax. The value of a variable is evaluated as an arithmetic expression when it is referenced, or when a variable which has been given the integer attribute using declare -i is assigned a value... A shell variable need not have its integer attribute turned on to be used in an expression.

Conditional expressions are used by the [[ compound command and the test and [ builtin commands to test file attributes and perform string and arithmetic comparisons...

arg1OParg2

OP is one of -eq, -ne, -lt, -le, -gt, or -ge. These arithmetic binary operators return true if arg1 is equal to, not equal to, less than, less than or equal to, greater than, or greater than or equal to arg2, respectively. arg1 and arg2 may be positive or negative integers.

I think, given all that context, the behavior you observe stands to reason, even if it is not explicitly spelled out as a possibility in the documentation there. The docs do point to special treatment of parameters with integer attributes, and clearly denote a difference between a compound command and a builtin command.

The [[ comparison is syntax in the same sense that the assignment name=value is syntax or casewordin... is syntax. test and [, however, are not as such, and are rather separate procedures which take arguments. As I think, the best way to really get a feel for the differences is to have a look at shell error output:

set   '[[ \\ -eq 0 ]]' '[ \\ -eq 0 ]'
for    sh in   bash ksh
do     for     exp
       do     "$sh" -c  "$1||$2"
               set "$2" "$1"
done;  done

bash: [[: \: syntax error: operand expected (error token is "\")
bash: line 0: [: \: integer expression expected
bash: line 0: [: \: integer expression expected
bash: [[: \: syntax error: operand expected (error token is "\")
ksh: \: arithmetic syntax error
ksh: [: \: arithmetic syntax error
ksh: \: arithmetic syntax error

The two shells handle the exceptions differently, but the underlying reasons for the differences in both cases for both shells are very similar.

bash directly calls the [[ \\ case a syntax error - in the same way it might for a redirect from a non-existent file, for example - though it goes on from that point (as I believe, incorrectly) to evaluate the other side of the || or expression. bash does give the [[ expression a command name in error output, but note that it doesn't bother discussing the line number on which you call it as it does for the [ command. bash's [ complains about not receiving what it expects to be an integer expression as an argument, but [[ need not complain in that way because it doesn't really take arguments, and never needs to expect anything at all when it is parsed alongside the expansions themselves.
ksh halts altogether when the [[ syntax error and doesn't bother with [ at all. It writes the same error message for both, but note that [ is assigned a command name there where [[ is just ksh. The [ is only called after the command-line has been successfully parsed and expansions have already occurred - it will do its own little getopts routine and get its own arg[0c] and the rest, but [[ is handled as underlying shell syntax once again.

I consider the bash docs slightly less clear than the ksh version in that they use the terms arg[12] rather than expression regarding integer comparisons, but I think it is done merely because [[, [, and test are all lumped together at that juncture, and the latter two do take arguments whereas the former only ever receives an expression.

In any case, where the integer comparison is not ambiguous in the syntax context, you can basically do any valid math operation mid-expression:

   m=5+5  a[m]=10
[[     m   -eq 10 ]] &&
[[     m++ -eq 10 ]] &&
[[     m-- -gt 10 ]] &&
[[ ${a[m]}  == 10 ]] &&
echo "math evals"

math evals

Best Answer

Related Solutions

Shell – Associative Arrays in Shell Scripts

Shells with associative arrays

Shells without associative arrays

Bash – Is “Arithmetic Expansion” the expected action on vars inside `[[` tests

Related Question