mawk
(and nawk
) provide only synthesised multi-dimensional arrays.
gawk
provides (since 4.0, thx manatwork) true multi-dimensional arrays, though the man page (IMHO) misdirects a little: immediately after introducing if ((i,j) in array)
it follows with "The in construct may also be used in a for loop to iterate over all the elements of an array." (fixed since v4.1.1!).
But, for ((i,j) in array)
is not the way to iterate over these, the gawk
way is (as you used originally):
for (i in array)
for (j in array[i])
print array[i][j]
With nawk
/mawk
you are stuck with synthesised multi-dimensional arrays, so
for (ij in A) {
split(ij,xx,SUBSEP);
printf("A[%s,%s]=%s\n",xx[1],xx[2],A[ij])
}
Now, your next problem is going to be ordering, array indexes are implicitly string type, and arrays are implicitly unordered. Unless you have separate knowledge of the indexes, as would be the case with a simple non-sparse array with consecutive integer indexes from 0..N. gawk
offers a solution for an ordered in
.
If you know the indexes of a synthesised array, then you can use A[i,j]
(which is treated as A[i SUBSEP j]
), or for
/in
and some string splitting to rebuild the list of i
and j
, or if ((i,j) in A)
(test for presence, without autovivification of indexes).
In gawk
you cannot use (i,j) in arr
where arr is a true multi-dimensional array, you need to break it into two (or however many dimensions) for
loops as above. To be fully correct though, the inner loop(s) should contain an isarray()
condition, since it's not required that every element of arr[i]
in turn be an array, gawk is happy to allow scalars too.
I know of no mawk
specific documentation other than the man page, it aims to be a standard new awk
(i.e. nawk
) implementation (so no true multi-dimensional arrays, no index sorting, and no isarray()
).
With GNU awk 4.1.3 in bash on cygwin:
$ cat tst.sh
#!/bin/awk -f
BEGIN { print "Executing:", ENVIRON["_"] }
$ ./tst.sh
Executing: ./tst.sh
I don't know how portable that is. As always, though, I wouldn't execute an awk script using a shebang in a shell script as it just robs you of possible functionality. Keep it simple and just do this instead:
$ cat tst2.sh
awk -v cmd="$0" '
BEGIN { print "Executing:", cmd }
' "$@"
$ ./tst2.sh
Executing: ./tst2.sh
That last will work with any modern awk in any shell on any platform.
Best Answer
Interval regexp operators are supported in POSIX compliant
awk
implementations.But as
awk
initially didn't support them (neither did nawk nor mawk nor gawk), there are still several implementations that don't support them likemawk
, the one true awk (originally maintained by Brian Kernighan, thek
inawk
) until a few days ago, Solaris/bin/awk
, Solaris/bin/nawk
, theawk
of most BSDs.Like for
egrep
, several implementations objected to adding support for them as they would break backward compatibility (there was no similar problem for\{x,y\}
in BREs as used bygrep
).\w
,\d
,\D
are perl regexp extensions which are generally not supported (busyboxawk
andgawk
(when not in POSIX mode) support\w
). The standard equivalents would be[[:alpha:]_]
,[[:digit:]]
,[^[:digit:]]
respectively, but are not supported bymawk
yet¹.On Solaris, you'll want to use
/usr/xpg4/bin/awk
.With older versions of GNU
awk
, you had to use the--re-interval
option, or start it withPOSIXLY_CORRECT=anything
in the environment for the regex intervals to be supported.With implementation that don't support them, you can use combinations of
?
,+
and*
:x{1,3}
->xx?x?
or(x|xx|xxx)
x{1,}
->x+
x{0,}
->x*
x{3,}
->xxx+
orxxxx*
x{3,6}
->xxxx?x?x?
¹ anyway,
mawk
doesn't support localisation or multi-byte characters, so you might as well restrict to ASCII characters and use[_a-zA-Z]
,[0-9]
and[^0-9]