Bash – Inverting an associative array

associative arraybashscriptingzsh

Let's say I have an associative array in bash,

declare -A hash
hash=(
    ["foo"]=aa
    ["bar"]=bb
    ["baz"]=aa
    ["quux"]=bb
    ["wibble"]=cc
    ["wobble"]=aa
)

where both keys and values are unknown to me (the actual data is read from external sources).

How may I create an array of the keys corresponding to the same value, so that I may, in a loop over all unique values, do

printf 'Value "%s" is present with the following keys: %s\n' "$value" "${keys[*]}"

and get the output (not necessarily in this order)

Value "aa" is present with the following keys: foo baz wobble
Value "bb" is present with the following keys: bar quux
Value "cc" is present with the following keys: wibble

The important bit is that the keys are stored as separate elements in the keys array and that they therefore do not need to be parsed out of a text string.

I could do something like

declare -A seen
seen=()
for value in "${hash[@]}"; do
    if [ -n "${seen[$value]}" ]; then
        continue
    fi

    keys=()
    for key in "${!hash[@]}"; do
        if [ "${hash[$key]}" = "$value" ]; then
            keys+=( "$key" )
        fi
    done

    printf 'Value "%s" is present with the following keys: %s\n' \
        "$value" "${keys[*]}"

    seen[$value]=1
done

But it seems a bit inefficient with that double loop.

Is there a piece of array syntax that I've missed for bash?

Would doing this in e.g. zsh give me access to more powerful array manipulation tools?

In Perl, I would do

my %hash = (
    'foo'    => 'aa',
    'bar'    => 'bb',
    'baz'    => 'aa',
    'quux'   => 'bb',
    'wibble' => 'cc',
    'wobble' => 'aa'
);

my %keys;
while ( my ( $key, $value ) = each(%hash) ) {
    push( @{ $keys{$value} }, $key );
}

foreach my $value ( keys(%keys) ) {
    printf( "Value \"%s\" is present with the following keys: %s\n",
        $value, join( " ", @{ $keys{$value} } ) );
}

But bash associative arrays can't hold arrays…

I'd also be interested in any old school solution possibly using some form of indirect indexing (building a set of index array(s) when reading the values that I said I had in hash above?). It feels like there ought to be a way to do this in linear time.

Best Answer

zsh

to reverse keys <=> values

In zsh, where the primary syntax for defining a hash is hash=(k1 v1 k2 v2...) like in perl (newer versions also support the awkward ksh93/bash syntax for compatibility though with variations when it comes to quoting the keys)

keys=("${(@k)hash}")
values=("${(@v)hash}")

typeset -A reversed
reversed=("${(@)values:^keys}") # array zipping operator

or using a loop:

for k v ("${(@kv}hash}") reversed[$v]=$k

The @ and double quotes is to preserve empty keys and values (note that bash associative arrays don't support empty keys). As the expansion of elements in associative arrays is in no particular order, if several elements of $hash have the same value (which will end up being a key in $reversed), you can't tell which key will be used as the value in $reversed.

for your loop

You'd use the R hash subscript flag to get elements based on value instead of key, combined with e for exact (as opposed to wildcard) match, and then get the keys for those elements with the k parameter expansion flag:

for value ("${(@u)hash}")
  print -r "elements with '$value' as value: ${(@k)hash[(Re)$value]}"

your perl approach

zsh (contrary to ksh93) doesn't support arrays of arrays, but its variables can contain the NUL byte, so you could use that to separate elements if the elements don't otherwise contain NUL bytes, or use the ${(q)var} / ${(Q)${(z)var}} to encode/decode a list using quoting.

typeset -A seen
for k v ("${(@kv)hash}")
  seen[$v]+=" ${(q)k}"

for k v ("${(@kv)seen}")
  print -r "elements with '$k' as value: ${(Q@)${(z)v}}"

ksh93

ksh93 was the first shell to introduce associative arrays in 1993. The syntax for assigning values as a whole means it's very difficult to do it programmatically contrary to zsh, but at least it's somewhat justified in ksh93 in that ksh93 supports complex nested data structures.

In particular, here ksh93 supports arrays as values for hash elements, so you can do:

typeset -A seen
for k in "${!hash[@]}"; do
  seen[${hash[$k]}]+=("$k")
done

for k in "${!seen[@]}"; do
  print -r "elements with '$k' as value ${x[$k][@]}"
done

bash

bash added support for associative arrays decades later, copied the ksh93 syntax, but not the other advanced data structures, and doesn't have any of the advanced parameter expansion operators of zsh.

In bash, you could use the quoted list approach mentioned in the zsh using printf %q or with newer versions ${var@Q}.

typeset -A seen
for k in "${!hash[@]}"; do
  printf -v quoted_k %q "$k"
  seen[${hash[$k]}]+=" $quoted_k"
done

for k in "${!seen[@]}"; do
  eval "elements=(${seen[$k]})"
  echo -E "elements with '$k' as value: ${elements[@]}"
done

As noted earlier however, bash associative arrays don't support the empty value as a key, so it won't work if some of $hash's values are empty. You could choose to replace the empty string with some place holder like <EMPTY> or prefix the key with some character that you'd later strip for display.

Related Question