Shell – Assigning a new value directly into a character index of a value in an array with zsh

arrayshell-scriptvariablezsh

If I have an array of strings and I want to change a single character in one value I could do this:

$ array=(hello world)
$ array[2]=${array[2]:0:2}X${array[2]:3}
$ echo $array[2]
woXld

Even though this solution works it's slow for very long strings as it actually reassigns the value at the index and has to expand both sides of the original value, before and after the changed character.

While it's possible to address the character indices of individual values in an array:

$ echo ${array[2][3]}
X

and to assign a new value into a character index for a scalar variable:

$ string='hello world'
$ string[9]=X
$ echo $string
hello woXld

a similar syntax doesn't work for arrays:

$ array[2][3]=X
zsh: no matches found: array[2][3]=X

Is there a way to do what I want faster than with the first method?

Best Answer

There does not really seem to be a way to do this elegantly. This is probably because zsh does not actually support nested arrays and the syntax is therefore not fully developed.

One thing you could try is to use a temporary variable instead of slicing around the character you want to change:

array=(hello world)
tmp=$array[2]
tmp[3]=X
array[2]=$tmp

Whether it is actually faster seems to depend on the length of $tmp or possibly the array as a whole.

I did a bit of performance testing and got some interesting results: If you are handling only scalars, then replacing a single charcter by index (method A)

foo[500]=X

seems to be always much faster than slicing the left and right partitions and building a new string (method B)

foo=${foo:0:499}X${foo:500}

I put both inside a loop with 100,000 iterations for strings of length 100, 1,000, 10,000 and 100,000 with the following results:

./scalar_A100.zsh 100000:         0.16s
./scalar_A1000.zsh 100000:        0.29s
./scalar_A10000.zsh 100000:       1.66s
./scalar_A100000.zsh 100000:     14.63s

 ./scalar_B100.zsh 100000:        0.42s
 ./scalar_B1000.zsh 100000:       1.17s
 ./scalar_B10000.zsh 100000:      5.39s
 ./scalar_B100000.zsh 100000:    46.23s

Unfortunately, when the string is inside of an array, it depends on the length of the string (or maybe the array itself).

For the array test, I used an array with two elements, the first being "hello" and the second being again a string with lengths between 100 and 100,000 characters.

Except for relativesly short strings, method A (via temporary variable)

foo=$bar[2]
foo[500]=O
bar[2]=$foo

is actually slower than replacing the array element inplace with slices:

bar[2]=${bar[2]:0:499}O${bar[2]:500}

This is due to the values actually being copied into the temporary variable and back into the array. Here are the results for that:

./array_A100.zsh 100000:      0.46s
./array_A1000.zsh 100000:     1.84s
./array_A10000.zsh 100000:   10.50s
./array_A100000.zsh 100000: 101.03s

./array_B100.zsh 100000:      0.60s
./array_B1000.zsh 100000:     1.35s
./array_B10000.zsh 100000:    3.17s
./array_B100000.zsh 100000:  22.13s

Also note that handling arrays is slower than scalars in every case.

Related Solutions

Bash – the rationale behind $array not expanding the whole array in ksh and bash

I can't give an answer but suggest some possible explanations.

It's true that except for ksh and its clones (pdksh and further derivatives and bash), all other shells with arrays (csh, tcsh, rc, es, akanga, fish, zsh, yash) have $array expand to all the members of the array.

But in both yash and zsh (when in sh emulation), the two Bourne-like shells in that list, that expansion is still subject to split+glob (and still empty removal even in zsh even when not in sh emulation), so you still need to use the awkward "${array[@]}" syntax (or "${(@)array}" or "$array[@]" in zsh which are hardly easier to type) to preserve the list (csh and tcsh have similar issues). That split+glob and empty removal is the Bourne heritage (itself to some extent caused by the Thompson shell heritage where $1 was more like macro expansion).

rc and fish are two examples of later shells that don't have the Bourne baggage and with a cleaner approach. They acknowledge the fact that a shell being a command line interpreter, the primary things they deal with is lists (the list of arguments to commands), so list/array is the primary data type (there's only one type and it's lists in rc) and got rid of the split+glob-upon-expansion bug/misfeature of the Bourne shell (which is no longer needed now that the primary type is array).

Still, that doesn't explain why David Korn chose not to have $array expand to all elements but to the element of index 0.

Now, apart from csh/tcsh, all those shells are much newer than ksh, developed in the early 80s only a few years after the Bourne shell and Unix V7 were released. Unix V7 was the one that also introduced the environ. That was the fancy new thing at time. The environment is neat and useful but environment variables can't contain arrays unless you use some form of encoding.

That's only conjecture but I suspect one reason for David Korn to choose that approach was so that the interface with the environment was not modified.

In ksh88, like rc, all variables were arrays (sparse though; a bit like associative arrays with keys limited to positive integers which is another oddity compared to other shells or programming languages, and you could tell it hadn't been completely thought through as it was impossible for instance to retrieve the list of keys). In that new design, var=value became short for var[0]=value. You could still export all your variables, but export var exports the element of index 0 of the array to the environment.

rc does put all its variables in the environment, fish supports exporting arrays, but to do that for arrays with more than one element, (at least for the port to Unix for rc which comes from plan9), they have to resort to some form of encoding which is only understood by them.

csh, tcsh, zsh don't support exporting arrays (though nowadays that may not sound like a big limitation). You can export arrays in yash, but they're exported as an environment variable that's the array elements joined with : (so (a "" "" b) and (a : b) are exported to the same value) and there's no converting back to array on importing.

Another possible justification might have been the consistency with Bourne's $@/$* (but then why have array indices start at 0 instead of 1 (another oddity compared to other shells/languages of the time)?). ksh was not free software, it was a commercial enterprise, one of the requirements was Bourne compatibility. ksh did remove the field splitting done on every non-quoted word in list context (as that was clearly not useful in the Bourne shell) but had to keep it for expansions (as scripts did use things like var="file1 file2"; cmd $var as the Bourne shell had no array but "$@"). Keeping that in a shell that otherwise has arrays makes little sense, but Korn had little other option if Ksh was to still be able to interpret scripts of the consumer base. If $scalar was subject to split+glob, $array would have to be for consistency, and so "${array[@]}" as a generalisation of "$@" made some sense. zsh had no similar constraint so was free to remove the split+glob upon expansions at the same time as adding arrays (but paid a price for breaking Bourne backward compatible).

Another explanation as offered by @Arrow might have been that he didn't want to overload the existing operators to make them behave differently for different types of variables (for instance ${#var} vs ${#array} though the Bourne shell didn't have that one or ${var-value}, ${var#pattern}) which can cause confusion for users (in zsh it's not always obvious how some operators work with array vs scalar).

Some related reading:

https://www.usenix.org/legacy/publications/library/proceedings/vhll/full_papers/korn.ksh.a where David Korn explains the development of ksh (arrays first added as patches over the Bourne shell for a form entry system).
https://news.slashdot.org/story/01/02/06/2030205/david-korn-tells-all

As to the a=$@ case in your edit, that's actually one case where ksh broke compatibility with the Bourne shell.

In the Bourne shell, $@ and $* contained the concatenation of the positional parameters with space characters. Only $@ when quoted was special as it expanded to the same as "$*" but with the inserted spaces not quoted (with special cases for the empty list in the newer versions where it has been addressed like on Solaris). You'll notice that if you remove space from $IFS, "$@" expands to just one argument in list contexts (0 for an empty list in the fixed versions mentioned above). When not quoted, $* and $@ behave like any other variable (split upon characters of $IFS, not necessarily on the original positional parameters). For instance, in the Bourne shell:

'set' 'a:b'   'c'
IFS=:
printf '<%s>\n' $@
printf '[%s]\n' "$@"

Would output:

<a>
<b c>
[a:b c]

Ksh88 changed that so that $@ and $* were joined with the first character of $IFS. "$@" in list context separates the positional parameters except when $IFS is empty.

When $IFS is empty, $* are joined on space, except for $* when quoted which is joined on with no separator.

Examples:

$ set a b
$ IFS=:
$ a=$@ b=$* c="$@" d="$*"
$ printf '<%s>\n' "$a" "$b" "$c" "$d" $@ $* "$@" "$*"
<a:b>
<a:b>
<a:b>
<a:b>
<a>
<b>
<a>
<b>
<a>
<b>
<a:b>
$ IFS=
$ a=$@ b=$* c="$@" d="$*"
$ printf '<%s>\n' "$a" "$b" "$c" "$d" $@ $* "$@" "$*"
<a b>
<a b>
<a b>
<ab>
<a b>
<a b>
<a b>
<ab>

You'll see a lot of variations in the different Bourne/Korn-like shells including ksh93 vs ksh88. There are also some variations in cases like:

set --
cmd ''"$@"
cmd $empty"$@"

Or when $IFS contains multi-byte characters, or bytes not forming valid characters.

Bash – Referencing bash array variables from another array

Bash 4.3 and later supports "name references", or namerefs (a similar concept exists in ksh93, but the scoping is annoyingly different):

#!/bin/bash

array1=('array1string1' 'array1string2')
array2=('array2string1' 'array2string2')

array_names=('array1' 'array2')

for a in "${array_names[@]}"; do
    declare -n arr="$a"

    for b in "${arr[@]}"; do
        echo "$b"
    done
done

The variable arr is a nameref that acts like an alias for the named variable (the variable with name $a in this example).

Without namerefs, in earlier Bash versions, one solution would be to create a new array that contains all the elements from the other arrays:

all=( "${array1[@]}" "${array2[@]}" )

... a bit like the array_names array in the question but with the contents of all arrays, and then iterate over "${all[@]}".

It's also possible to use eval, but the resulting code looks astoundingly awful.

See glenn jackman's answer for a variation with variable indirection (introduced in its current form with Bash version 2).

Best Answer

Related Solutions

Bash – the rationale behind $array not expanding the whole array in ksh and bash

Bash – Referencing bash array variables from another array

Related Question