Is there a way to stop processing a single line in awk? Is there something like break
or continue
that works on pattern-action pairs rather than control structures within an action?
Suppose I have the following input.txt
file and I'm trying to replace each of the names with x0
, x1
, x2
, ...
. However, I want to leave lines beginning with a space or -
alone.
-- data
bob 4
joe 5
bob 6
joe 7
becomes:
-- data
x0 4
x1 5
x0 6
x1 7
And I have the following script that does it. (As a side note, there's probably a better way of structuring this using a heredoc rather than a massive string literal).
#!/bin/sh
awk '
BEGIN { c = 0; }
# do not process lines beginning with - or space
/^[- ]/ {
print;
}
# update
/^[^- ]/ {
if (! ($1 in name) ) {
new_name = "x" c;
c += 1;
name[$1] = new_name;
}
$1 = name[$1];
print;
}
' input.txt
This script leaves a bit to be desired. First of all, we know that /^[- ]/
and /^[^- ]/
are mutually exclusive, but that property isn't enforced anywhere. I'd like to be able to use something like break
to abandon processing the line after the first match.
/^[- ]/ {
print;
break;
}
I'd like to be able to add another clause to alert the user to a problem if there is a non-empty line that doesn't match either of the first two patterns.
/./ {
print "non-empty line!" > "/dev/stderr"
# or print "non-empty line!" > "/dev/tty" if portability is a concern
}
However, if I add this pattern-action pair to the script as-is it fires after every non-empty line.
Is there something I can add after the first two test cases to stop processing the line since it has been "successfully" handled? If that isn't possible, is there a common awk idiom for a catch-all case?
Best Answer
You may use the
awk
statementnext
to immediately continue with processing the next input record.Here's an alternative implementation of your
awk
script:The script is
c
is the counter. It will be zero from the start.n
is the associative array holding the new labels/names. It is indexed with the data from the file's first field/column.!n[$1]
will be true if the data in the first field has not already been assigned a new label/name.