AWK Scripting – Is the AWK END Behavior to Keep the Last Line Loaded in $0 in the Man Page

awk

I read another answer that describes how to use AWK to view the last line of output:

$ seq 42 | awk 'END { print }'
42

So it seems like when the END block is run the last line is loaded in $0.

This surprised me because the first line isn't loaded into the BEGIN block:

$ seq 42 | awk 'BEGIN { print }'
#=> blank
  • Is this behavior documentation anywhere? (I searched through the man page but didn't find anything)

Best Answer

The BEGIN block is run before any input is processed, so $0 hasn’t been initialised yet.

The END block doesn’t do anything to $0, which keeps its last value. In your AWK script, that’s just the last line read, because AWK reads all its input line by line, does its usual field-splitting processing (assigning $0 and so on), but never finds a matching block; but for example

seq 42 | awk '{ $0 = "21" } END { print }'

outputs 21, not 42, so it’s not the case that “when the END block is run the last line is loaded in $0”.

This isn’t documented in the gawk(1) manpage, but it is documented in mawk(1) (for that implementation of AWK obviously):

Similarly, on entry to the END actions, $0, the fields and NF have their value unaltered from the last record.

The GNU AWK manual does mention this behaviour:

In fact, all of BWK awk, mawk, and gawk preserve the value of $0 for use in END rules.

“BWK awk” is Brian Kernighan’s awk, the “one true awk; it implemented this behaviour in 2005, as documented in its FIXES file:

Apr 24, 2005: modified lib.c so that values of $0 et al are preserved in the END block, apparently as required by posix. thanks to havard eidnes for the report and code.

That change is visible in the “one true awk” history. The latest release of BWK awk behaves in the same way as GNU AWK:

$ echo three fields here | ./awk '{ $0 = "one" } END { print $0 " " NF }'
one 1
$ echo three fields here | ./awk 'END { $0 = "one"; print $0 " " NF }'
one 1
Related Question