Bash – Inverting an associative array

associative arraybashscriptingzsh

Let's say I have an associative array in bash,

declare -A hash
hash=(
    ["foo"]=aa
    ["bar"]=bb
    ["baz"]=aa
    ["quux"]=bb
    ["wibble"]=cc
    ["wobble"]=aa
)

where both keys and values are unknown to me (the actual data is read from external sources).

How may I create an array of the keys corresponding to the same value, so that I may, in a loop over all unique values, do

printf 'Value "%s" is present with the following keys: %s\n' "$value" "${keys[*]}"

and get the output (not necessarily in this order)

Value "aa" is present with the following keys: foo baz wobble
Value "bb" is present with the following keys: bar quux
Value "cc" is present with the following keys: wibble

The important bit is that the keys are stored as separate elements in the keys array and that they therefore do not need to be parsed out of a text string.

I could do something like

declare -A seen
seen=()
for value in "${hash[@]}"; do
    if [ -n "${seen[$value]}" ]; then
        continue
    fi

    keys=()
    for key in "${!hash[@]}"; do
        if [ "${hash[$key]}" = "$value" ]; then
            keys+=( "$key" )
        fi
    done

    printf 'Value "%s" is present with the following keys: %s\n' \
        "$value" "${keys[*]}"

    seen[$value]=1
done

But it seems a bit inefficient with that double loop.

Is there a piece of array syntax that I've missed for bash?

Would doing this in e.g. zsh give me access to more powerful array manipulation tools?

In Perl, I would do

my %hash = (
    'foo'    => 'aa',
    'bar'    => 'bb',
    'baz'    => 'aa',
    'quux'   => 'bb',
    'wibble' => 'cc',
    'wobble' => 'aa'
);

my %keys;
while ( my ( $key, $value ) = each(%hash) ) {
    push( @{ $keys{$value} }, $key );
}

foreach my $value ( keys(%keys) ) {
    printf( "Value \"%s\" is present with the following keys: %s\n",
        $value, join( " ", @{ $keys{$value} } ) );
}

But bash associative arrays can't hold arrays…

I'd also be interested in any old school solution possibly using some form of indirect indexing (building a set of index array(s) when reading the values that I said I had in hash above?). It feels like there ought to be a way to do this in linear time.

Best Answer

zsh

to reverse keys <=> values

In zsh, where the primary syntax for defining a hash is hash=(k1 v1 k2 v2...) like in perl (newer versions also support the awkward ksh93/bash syntax for compatibility though with variations when it comes to quoting the keys)

keys=("${(@k)hash}")
values=("${(@v)hash}")

typeset -A reversed
reversed=("${(@)values:^keys}") # array zipping operator

or using a loop:

for k v ("${(@kv}hash}") reversed[$v]=$k

The @ and double quotes is to preserve empty keys and values (note that bash associative arrays don't support empty keys). As the expansion of elements in associative arrays is in no particular order, if several elements of $hash have the same value (which will end up being a key in $reversed), you can't tell which key will be used as the value in $reversed.

for your loop

You'd use the R hash subscript flag to get elements based on value instead of key, combined with e for exact (as opposed to wildcard) match, and then get the keys for those elements with the k parameter expansion flag:

for value ("${(@u)hash}")
  print -r "elements with '$value' as value: ${(@k)hash[(Re)$value]}"

your perl approach

zsh (contrary to ksh93) doesn't support arrays of arrays, but its variables can contain the NUL byte, so you could use that to separate elements if the elements don't otherwise contain NUL bytes, or use the ${(q)var} / ${(Q)${(z)var}} to encode/decode a list using quoting.

typeset -A seen
for k v ("${(@kv)hash}")
  seen[$v]+=" ${(q)k}"

for k v ("${(@kv)seen}")
  print -r "elements with '$k' as value: ${(Q@)${(z)v}}"

ksh93

ksh93 was the first shell to introduce associative arrays in 1993. The syntax for assigning values as a whole means it's very difficult to do it programmatically contrary to zsh, but at least it's somewhat justified in ksh93 in that ksh93 supports complex nested data structures.

In particular, here ksh93 supports arrays as values for hash elements, so you can do:

typeset -A seen
for k in "${!hash[@]}"; do
  seen[${hash[$k]}]+=("$k")
done

for k in "${!seen[@]}"; do
  print -r "elements with '$k' as value ${x[$k][@]}"
done

bash

bash added support for associative arrays decades later, copied the ksh93 syntax, but not the other advanced data structures, and doesn't have any of the advanced parameter expansion operators of zsh.

In bash, you could use the quoted list approach mentioned in the zsh using printf %q or with newer versions ${var@Q}.

typeset -A seen
for k in "${!hash[@]}"; do
  printf -v quoted_k %q "$k"
  seen[${hash[$k]}]+=" $quoted_k"
done

for k in "${!seen[@]}"; do
  eval "elements=(${seen[$k]})"
  echo -E "elements with '$k' as value: ${elements[@]}"
done

As noted earlier however, bash associative arrays don't support the empty value as a key, so it won't work if some of $hash's values are empty. You could choose to replace the empty string with some place holder like <EMPTY> or prefix the key with some character that you'd later strip for display.

Shells with associative arrays

Some modern shells provide associative arrays: ksh93, bash ≥4, zsh. In ksh93 and bash, if a is an associative array, then "${!a[@]}" is the array of its keys:

for k in "${!a[@]}"; do
  echo "$k -> ${a[$k]}"
done

In zsh, that syntax only works in ksh emulation mode. Otherwise you have to use zsh's native syntax:

for k in "${(@k)a}"; do
  echo "$k -> $a[$k]"
done

${(k)a} also works if a does not have an empty key.

In zsh, you could also loop on both keys and values at the same time:

for k v ("${(@kv)a}") echo "$k -> $v"

Shells without associative arrays

Emulating associative arrays in shells that don't have them is a lot more work. If you need associative arrays, it's probably time to bring in a bigger tool, such as ksh93 or Perl.

If you do need associative arrays in a mere POSIX shell, here's a way to simulate them, when keys are restricted to contain only the characters 0-9A-Z_a-z (ASCII digits, letters and underscore). Under this assumption, keys can be used as part of variable names. The functions below act on an array identified by a naming prefix, the “stem”, which must not contain two consecutive underscores.

## ainit STEM
## Declare an empty associative array named STEM.
ainit () {
  eval "__aa__${1}=' '"
}
## akeys STEM
## List the keys in the associatve array named STEM.
akeys () {
  eval "echo \"\$__aa__${1}\""
}
## aget STEM KEY VAR
## Set VAR to the value of KEY in the associative array named STEM.
## If KEY is not present, unset VAR.
aget () {
  eval "unset $3
        case \$__aa__${1} in
          *\" $2 \"*) $3=\$__aa__${1}__$2;;
        esac"
}
## aset STEM KEY VALUE
## Set KEY to VALUE in the associative array named STEM.
aset () {
  eval "__aa__${1}__${2}=\$3
        case \$__aa__${1} in
          *\" $2 \"*) :;;
          *) __aa__${1}=\"\${__aa__${1}}$2 \";;
        esac"
}
## aunset STEM KEY
## Remove KEY from the associative array named STEM.
aunset () {
  eval "unset __aa__${1}__${2}
        case \$__aa__${1} in
          *\" $2 \"*) __aa__${1}=\"\${__aa__${1}%%* $2 } \${__aa__${1}#* $2 }\";;
        esac"
}

(Warning, untested code. Error detection for syntactically invalid stems and keys is not provided.)

Zparseopts with associative array in older version of zsh

Replace the last line of your script with echo ${(kv)opts}. Running with 4.3.6 and 5.0.6 should show that 4.3.6 interprets -K to reset opts if any options are given, while 5.0.6 only resets opts[--opt1] when --opt1 is used (leaving --opt2 or any other entry alone).

(Note this appeared to change sometime between 5.0.2 and 5.0.6; you might want to ask on the zsh-workers mailing list to confirm.)=