Stop processing single line in awk after successful match

awk

Is there a way to stop processing a single line in awk? Is there something like break or continue that works on pattern-action pairs rather than control structures within an action?

Suppose I have the following input.txt file and I'm trying to replace each of the names with x0, x1, x2, .... However, I want to leave lines beginning with a space or - alone.

-- data
bob     4
joe     5
bob     6
joe     7

becomes:

-- data
x0 4
x1 5
x0 6
x1 7

And I have the following script that does it. (As a side note, there's probably a better way of structuring this using a heredoc rather than a massive string literal).

#!/bin/sh
awk '
    BEGIN { c = 0; }

    # do not process lines beginning with - or space
    /^[- ]/ {
        print;
    }

    # update 
    /^[^- ]/ {
        if (! ($1 in name) ) {
            new_name = "x" c;
            c += 1;
            name[$1] = new_name;
        }
        $1 = name[$1];
        print;
    }
' input.txt

This script leaves a bit to be desired. First of all, we know that /^[- ]/ and /^[^- ]/ are mutually exclusive, but that property isn't enforced anywhere. I'd like to be able to use something like break to abandon processing the line after the first match.

/^[- ]/ {
    print;
    break;
}

I'd like to be able to add another clause to alert the user to a problem if there is a non-empty line that doesn't match either of the first two patterns.

/./ {
    print "non-empty line!" > "/dev/stderr"
    # or print "non-empty line!" > "/dev/tty" if portability is a concern
}

However, if I add this pattern-action pair to the script as-is it fires after every non-empty line.

Is there something I can add after the first two test cases to stop processing the line since it has been "successfully" handled? If that isn't possible, is there a common awk idiom for a catch-all case?

Best Answer

You may use the awk statement next to immediately continue with processing the next input record.

Here's an alternative implementation of your awk script:

awk '/^[- ]/ { print; next } !n[$1] { n[$1] = sprintf("x%d", c++) } { $1 = n[$1]; print }' data.in

The script is

/^[- ]/ { print; next }
!n[$1]  { n[$1] = sprintf("x%d", c++) }
        { $1 = n[$1]; print }

c is the counter. It will be zero from the start.

n is the associative array holding the new labels/names. It is indexed with the data from the file's first field/column.

!n[$1] will be true if the data in the first field has not already been assigned a new label/name.

Related Question