Mawk – Walking Multidimensional Arrays in Mawk

awkmawk

I'm able to do this properly in gawk, but when I tried to post my code to the machine where it will run, I realized it was using mawk…

$ cat multidim.gawk
# test of multidimensional arrays
// {
        A[1][1]="A11"
        A[1][2]="A12"
        A[2][1]="A21"
        A[2][2]="A22"

        i=2
        for ( j in A[i] )
        {
                print "i=" i " j=" j " A[i][j]=" A[i][j]
        }
}


$ echo hi | awk -f multidim.gawk
i=2 j=1 A[i][j]=A21
i=2 j=2 A[i][j]=A22

seems mawk has a different idea about how multidimensional arrays should work. When I run it on Debian with mawk, I get a syntax error. A[i,j] seems the correct syntax, and it 'synthesizes' multidimensional arrays.

So I tried two things, neither work:

$ cat multidim.mawk
// {
        A[1,1]="A11"
        A[1,2]="A12"
        A[2,1]="A21"
        A[2,2]="A22"

        i=2
        for ( j in A[i] )
        {
                print "i=" i " j=" j "a[i,j]=" a[i,j]
        }
}

$ echo hi | awk -f multidim.mawk 
awk: multidim.mawk: line 9: syntax error at or near [

Seems sensible, using a 1dim array index on a "multidimensional" array generates an error.

Trying to just walk the WHOLE array so that I can use an if statement to selct the first dimension even (extremely inefficient and horrible)… but I can't even do that!:

$ cat multidim2.mawk
# test of multidimensional arrays
// { 
    A[1,1]="A11"    
    A[1,2]="A12"    
    A[2,1]="A21"    
    A[2,2]="A22"    

    for ( (i, j) in A )
    {
        print "i=" i " j=" j "a[i,j]=" a[i,j]
    }
}
$ echo hi | awk -f multidim2.mawk 
awk: multidim2.mawk: line 8: syntax error at or near )

Is there any way to walk a multidimensional array in mawk?

Is there a language reference other than the mawk manpage?

Thanks!

Best Answer

mawk (and nawk) provide only synthesised multi-dimensional arrays.

gawk provides (since 4.0, thx manatwork) true multi-dimensional arrays, though the man page (IMHO) misdirects a little: immediately after introducing if ((i,j) in array) it follows with "The in construct may also be used in a for loop to iterate over all the elements of an array." (fixed since v4.1.1!).

But, for ((i,j) in array) is not the way to iterate over these, the gawk way is (as you used originally):

 for (i in array)
     for (j in array[i])
         print array[i][j]

With nawk/mawk you are stuck with synthesised multi-dimensional arrays, so

for (ij in A) {
    split(ij,xx,SUBSEP);
    printf("A[%s,%s]=%s\n",xx[1],xx[2],A[ij])
}

Now, your next problem is going to be ordering, array indexes are implicitly string type, and arrays are implicitly unordered. Unless you have separate knowledge of the indexes, as would be the case with a simple non-sparse array with consecutive integer indexes from 0..N. gawk offers a solution for an ordered in.

If you know the indexes of a synthesised array, then you can use A[i,j] (which is treated as A[i SUBSEP j]), or for/in and some string splitting to rebuild the list of i and j, or if ((i,j) in A) (test for presence, without autovivification of indexes).

In gawk you cannot use (i,j) in arr where arr is a true multi-dimensional array, you need to break it into two (or however many dimensions) for loops as above. To be fully correct though, the inner loop(s) should contain an isarray() condition, since it's not required that every element of arr[i] in turn be an array, gawk is happy to allow scalars too.

I know of no mawk specific documentation other than the man page, it aims to be a standard new awk (i.e. nawk) implementation (so no true multi-dimensional arrays, no index sorting, and no isarray()).

Related Solutions

Awk comparison using arrays

I don't see why you would want to do it in a single awk command, what you have seems perfectly fine. Anyway, here's one way:

$ awk -F, '(max[$18]<$21 || max[$18]==""){max[$18]=$21;line[$18]=$0}
            END{for(key in line){print line[key]}}' file
6598,6598,0,1,,1,0,1,1,0,0,0,1,0,0,0,0,1390,1390,,0.730000,
1297,1297,0,0,,0,0,1,0,0,0,0,0,1,0,1,0,1707,1707,,7.000000,
6553,6553,0,1,,1,0,1,1,0,0,0,0,1,0,1,0,4326,4326,,9.000000,

The idea is very simple. We have two arrays, max has $18 as a key and $21 as a value. For every line, if the saved value for $18 is smaller than $21 or if there is no value stored for $18, then we store the current line ($0) as the value for $18 in array line. Finally, in the END{} block, we print array line.

Note that the script above treats $18 as a string. Therefore, 001 and 1 will be considered different strings.

How to print own script name in mawk

With GNU awk 4.1.3 in bash on cygwin:

$ cat tst.sh
#!/bin/awk -f
BEGIN { print "Executing:", ENVIRON["_"] }

$ ./tst.sh
Executing: ./tst.sh

I don't know how portable that is. As always, though, I wouldn't execute an awk script using a shebang in a shell script as it just robs you of possible functionality. Keep it simple and just do this instead:

$ cat tst2.sh
awk -v cmd="$0" '
BEGIN { print "Executing:", cmd }
' "$@"

$ ./tst2.sh
Executing: ./tst2.sh

That last will work with any modern awk in any shell on any platform.

Best Answer

Related Solutions

Awk comparison using arrays

How to print own script name in mawk

Related Question