Shells with associative arrays
Some modern shells provide associative arrays: ksh93, bash ≥4, zsh. In ksh93 and bash, if a
is an associative array, then "${!a[@]}"
is the array of its keys:
for k in "${!a[@]}"; do
echo "$k -> ${a[$k]}"
done
In zsh, that syntax only works in ksh emulation mode. Otherwise you have to use zsh's native syntax:
for k in "${(@k)a}"; do
echo "$k -> $a[$k]"
done
${(k)a}
also works if a
does not have an empty key.
In zsh, you could also loop on both k
eys and v
alues at the same time:
for k v ("${(@kv)a}") echo "$k -> $v"
Shells without associative arrays
Emulating associative arrays in shells that don't have them is a lot more work. If you need associative arrays, it's probably time to bring in a bigger tool, such as ksh93 or Perl.
If you do need associative arrays in a mere POSIX shell, here's a way to simulate them, when keys are restricted to contain only the characters 0-9A-Z_a-z
(ASCII digits, letters and underscore). Under this assumption, keys can be used as part of variable names. The functions below act on an array identified by a naming prefix, the “stem”, which must not contain two consecutive underscores.
## ainit STEM
## Declare an empty associative array named STEM.
ainit () {
eval "__aa__${1}=' '"
}
## akeys STEM
## List the keys in the associatve array named STEM.
akeys () {
eval "echo \"\$__aa__${1}\""
}
## aget STEM KEY VAR
## Set VAR to the value of KEY in the associative array named STEM.
## If KEY is not present, unset VAR.
aget () {
eval "unset $3
case \$__aa__${1} in
*\" $2 \"*) $3=\$__aa__${1}__$2;;
esac"
}
## aset STEM KEY VALUE
## Set KEY to VALUE in the associative array named STEM.
aset () {
eval "__aa__${1}__${2}=\$3
case \$__aa__${1} in
*\" $2 \"*) :;;
*) __aa__${1}=\"\${__aa__${1}}$2 \";;
esac"
}
## aunset STEM KEY
## Remove KEY from the associative array named STEM.
aunset () {
eval "unset __aa__${1}__${2}
case \$__aa__${1} in
*\" $2 \"*) __aa__${1}=\"\${__aa__${1}%%* $2 } \${__aa__${1}#* $2 }\";;
esac"
}
(Warning, untested code. Error detection for syntactically invalid stems and keys is not provided.)
From man ksh
:
An arithmetic expression uses the same syntax, precedence, and associativity of expression as the C language. All the C language operators that apply to floating point quantities can be used... Variables can be referenced by name within an arithmetic expression without using the parameter expansion syntax. When a variable is referenced, its value is evaluated as an arithmetic expression...
A conditional expression is used with the [[
compound command to test attributes of files and to compare strings. Field splitting and file name generation are not performed on the words between [[
and ]]
. Each expression can be constructed from one or more of the following unary or binary expressions...
The following obsolete arithmetic comparisons are also permitted:
exp1
-eq
exp2
- True, if
exp1
is equal to exp2
.
exp1
-ne
exp2
- True, if
exp1
is not equal to exp2
.
exp1
-lt
exp2
- True, if
exp1
is less than exp2
.
exp1
-gt
exp2
- True, if
exp1
is greater than exp2
.
exp1
-le
exp2
- True, if
exp1
is less than or equal to exp2
.
exp1
-ge
exp2
- True, if
exp1
is greater than or equal to exp2
.
The documentation there is consistent where references to arithmetic expressions are concerned, and (apparently carefully) avoids any self-contradictions surrounding the definition of the [[
compound command ]]
pertaining to string comparison by explicitly also permitting some obsolete arithmetic comparisons in the same context.
From man bash
:
[[
expression
]]
- Return a status of
0
or 1
depending on the evaluation of the condi‐ tional expression expression
. Expressions are composed of the primaries described below... Word splitting and pathname expansion are not performed on the words between the [[
and ]]
; ~
tilde expansion, ${
parameter}
and $
variable expansion, $((
arithmetic expansion))
, $(
command substitution)
, <(
process substitution)
, and "\'
quote removal are performed. Conditional operators such as -f
must be unquoted to be recognized as primaries...
A variable may be assigned to by a statement of the form:
If value
is not given, the variable is assigned the null string. All values undergo ~
tilde expansion, ${
parameter}
and $
variable expansion, $(
command substitution)
, $((
arithmetic expansion))
, and "\'
quote removal... If the variable has its integer attribute set, then value
is evaluated as an $((
arithmetic expression))
even if the $((
...
))
expansion is not used...
The shell allows arithmetic expressions
to be evaluated, under certain circumstances... Evaluation is done in fixed-width integers with no check for overflow, though division by 0 is trapped and flagged as an error. The operators and their precedence, associativity, and values are the same as in the C language.
Shell variables are allowed as operands; parameter expansion is performed before the expression is evaluated. Within an expression
, shell variables may also be referenced by name without using the parameter expansion syntax. The value of a variable is evaluated as an arithmetic expression
when it is referenced, or when a variable which has been given the integer attribute using declare -i
is assigned a value... A shell variable need not have its integer attribute turned on to be used in an expression.
Conditional expressions
are used by the [[
compound command and the test
and [
builtin commands to test file attributes and perform string and arithmetic comparisons...
arg1
OP
arg2
OP
is one of -eq
, -ne
, -lt
, -le
, -gt
, or -ge
. These arithmetic binary operators return true if arg1
is equal to, not equal to, less than, less than or equal to, greater than, or greater than or equal to arg2
, respectively. arg1
and arg2
may be positive or negative integers.
I think, given all that context, the behavior you observe stands to reason, even if it is not explicitly spelled out as a possibility in the documentation there. The docs do point to special treatment of parameters with integer attributes, and clearly denote a difference between a compound command and a builtin command.
The [[
comparison is syntax in the same sense that the assignment name
=
value
is syntax or case
word
in...
is syntax. test
and [
, however, are not as such, and are rather separate procedures which take arguments. As I think, the best way to really get a feel for the differences is to have a look at shell error output:
set '[[ \\ -eq 0 ]]' '[ \\ -eq 0 ]'
for sh in bash ksh
do for exp
do "$sh" -c "$1||$2"
set "$2" "$1"
done; done
bash: [[: \: syntax error: operand expected (error token is "\")
bash: line 0: [: \: integer expression expected
bash: line 0: [: \: integer expression expected
bash: [[: \: syntax error: operand expected (error token is "\")
ksh: \: arithmetic syntax error
ksh: [: \: arithmetic syntax error
ksh: \: arithmetic syntax error
The two shells handle the exceptions differently, but the underlying reasons for the differences in both cases for both shells are very similar.
bash
directly calls the [[ \\
case a syntax error - in the same way it might for a redirect from a non-existent file, for example - though it goes on from that point (as I believe, incorrectly) to evaluate the other side of the ||
or expression. bash
does give the [[
expression a command name in error output, but note that it doesn't bother discussing the line number on which you call it as it does for the [
command. bash
's [
complains about not receiving what it expects to be an integer expression as an argument, but [[
need not complain in that way because it doesn't really take arguments, and never needs to expect anything at all when it is parsed alongside the expansions themselves.
ksh
halts altogether when the [[
syntax error and doesn't bother with [
at all. It writes the same error message for both, but note that [
is assigned a command name there where [[
is just ksh
. The [
is only called after the command-line has been successfully parsed and expansions have already occurred - it will do its own little getopts
routine and get its own arg[0c]
and the rest, but [[
is handled as underlying shell syntax once again.
I consider the bash
docs slightly less clear than the ksh
version in that they use the terms arg[12]
rather than expression
regarding integer comparisons, but I think it is done merely because [[
, [
, and test
are all lumped together at that juncture, and the latter two do take arguments whereas the former only ever receives an expression
.
In any case, where the integer comparison is not ambiguous in the syntax context, you can basically do any valid math operation mid-expression
:
m=5+5 a[m]=10
[[ m -eq 10 ]] &&
[[ m++ -eq 10 ]] &&
[[ m-- -gt 10 ]] &&
[[ ${a[m]} == 10 ]] &&
echo "math evals"
math evals
Best Answer
The problem is that in a shell arithmetic expression, such as inside
$((...))
(POSIX), or((...))
(ksh/bash/zsh), or array indices or arguments of some shell builtins or[[...]]
operands, word expansions (${param}
,$((...))
,$[...]
,$(...)
,`...`
,${ ...; }
) are performed first, and then the resulting text is interpreted as an arithmetic expression.In the case of
$((...))
, that's even a POSIX requirement.That allows things like
op=+; echo "$(( 1 $op 2 ))"
to work, and that explains whya=1+1; echo "$(($a * 2))"
outputs3
instead of4
, as it's the1+1 * 2
expression that is evaluated.That's also partly why using unsanitised data in arithmetic expressions is a security vulnerability in general.
What is easy to overlook is that it also applies in things like
Above, except in
ksh93
,$var
is expanded first, and the result interpreted.That means that if
$var
contains@
or*
, then theassoc[@]++
orassoc[*]++
expressions are evaluated, and@
/*
have special meanings there. If$var
isx] + 2 + assoc[y
, that becomesassoc[x] + 2 + assoc[y]
.Now normally, in
$(( $var ))
, even if$var
contains something like$(reboot)
, there is no second round of expansion happening,reboot
is not going to be run. But as already seen at Security Implications of using unsanitized data in Shell Arithmetic evaluation, there's an exception if that appears insideword[...]
to allow recursive expansion. At the root of the problem is an unfortunate feature of the Korn shell whereby ifvar
contains an arithmetic expression, then in$((var))
the arithmetic expression in$var
is being evaluated, even recursively (like whenvar2='var3 + 1' var='var2 + 1'
), something allowed, but not required by POSIX.As that's extended to array members, that means that the contents of array indexes end up being evaluated recursively. So, if
$var
is$(reboot)
, then(( assoc[$var]++ ))
ends up callingreboot
.ksh93
seems to have some level of work around to it, but only when$var
doesn't contain$
it seems. So, while ksh93 is OK withvar=']'
,var='@'
, orvar='`reboot`'
, it's not with$(reboot)
.As an example, if we replace
reboot
with the harmlessuname>&2
:The
uname
command does end up being run (twice inbash
andzsh
, I suppose once for getting the current value and the second time to perform the assignment).In version 5.0, bash added a
assoc_expand_once
option which changes the behaviour:is now OK, but it doesn't address the problems with
@
,*
, or]
characters, so it doesn't address the arbitrary command execution vulnerability:(this time,
uname
is being run as part of the evaluation of a plain array (b
) index evaluation).The list of problematic characters varies with the shell.
$
is a problem for all three,\
,`
,[
and]
are a problem forbash
andzsh
,"
,'
forbash
. Same for the@
and*
and empty values. Also note that in some locales, the encoding of some characters do contain that of\
,[
or]
at least and could cause problems. How to escape those must be done differently in all three shells.To work around it, one can do:
instead. That is:
=
,++
,--
,+=
,/=
... arithmetic operators with an associative array member as the target.assoc[$var]
, but${assoc[$var]}
(or$assoc[$var]
inzsh
), or(${assoc[$var]})
if that's meant to contain an arithmetic expression instead of just a number.But, as always the value of that associative array member must be under your control, preferably a plain number, and like for any other parameter expansion, it's preferable to put whitespace around then. For instance
((1 - $var))
is preferable to((1-$var))
as the latter would cause problem for negative values (((1--1))
causes a syntax error in some shells as that's the--
operator applied to1
.Another caveat is that when
$var
is empty, in(( 1 + var ))
, thatvar
is still a token in the arithmetic expression syntax, and the corresponding value if0
. But in(( 1 + $var ))
, the arithmetic expression becomes1 +
which is a syntax error ((( $var + 1 ))
is OK though, as that becomes+ 1
, invoking the unary+
operator).Other approaches with
bash
(when theassoc_expand_once
option is not enabled) orzsh
(but notksh93
which still has a problem with]
and\
characters), are to delay the expansion until that second, recursive interpretation mentioned above.(( assoc[\$var]++ ))
let 'assoc[$var]++'
(make sure to use single quotes here)incr='assoc[$var]++'; (($incr))
(or even((incr))
)((' assoc[$var]++ '))
or(( assoc['$var']++ ))
(bash
only).Those have the advantage or preserving the exit status resulting from the arithmetic evaluation (success if non-zero), so one can do things like:
Now, that leaves one problem specific to the
bash
shell:bash
associative arrays don't support empty keys. Whileassoc[]=x
fails in bothbash
andzsh
(notksh93
),assoc[$var]
when$var
is empty works inzsh
orksh93
but notbash
. Evenzsh
'sassoc+=('' value)
now supported by bash-5.1 doesn't work inbash
.So if working with
bash
specifically and if an empty key is one of the possible values, the only option is to add a fixed prefix/suffix. So use for instance:Or: