Stop processing single line in awk after successful match

awk

Is there a way to stop processing a single line in awk? Is there something like break or continue that works on pattern-action pairs rather than control structures within an action?

Suppose I have the following input.txt file and I'm trying to replace each of the names with x0, x1, x2, .... However, I want to leave lines beginning with a space or - alone.

-- data
bob     4
joe     5
bob     6
joe     7

becomes:

-- data
x0 4
x1 5
x0 6
x1 7

And I have the following script that does it. (As a side note, there's probably a better way of structuring this using a heredoc rather than a massive string literal).

#!/bin/sh
awk '
    BEGIN { c = 0; }

    # do not process lines beginning with - or space
    /^[- ]/ {
        print;
    }

    # update 
    /^[^- ]/ {
        if (! ($1 in name) ) {
            new_name = "x" c;
            c += 1;
            name[$1] = new_name;
        }
        $1 = name[$1];
        print;
    }
' input.txt

This script leaves a bit to be desired. First of all, we know that /^[- ]/ and /^[^- ]/ are mutually exclusive, but that property isn't enforced anywhere. I'd like to be able to use something like break to abandon processing the line after the first match.

/^[- ]/ {
    print;
    break;
}

I'd like to be able to add another clause to alert the user to a problem if there is a non-empty line that doesn't match either of the first two patterns.

/./ {
    print "non-empty line!" > "/dev/stderr"
    # or print "non-empty line!" > "/dev/tty" if portability is a concern
}

However, if I add this pattern-action pair to the script as-is it fires after every non-empty line.

Is there something I can add after the first two test cases to stop processing the line since it has been "successfully" handled? If that isn't possible, is there a common awk idiom for a catch-all case?

Best Answer

You may use the awk statement next to immediately continue with processing the next input record.

Here's an alternative implementation of your awk script:

awk '/^[- ]/ { print; next } !n[$1] { n[$1] = sprintf("x%d", c++) } { $1 = n[$1]; print }' data.in

The script is

/^[- ]/ { print; next }
!n[$1]  { n[$1] = sprintf("x%d", c++) }
        { $1 = n[$1]; print }

c is the counter. It will be zero from the start.

n is the associative array holding the new labels/names. It is indexed with the data from the file's first field/column.

!n[$1] will be true if the data in the first field has not already been assigned a new label/name.

Explanation

-v n=2 defines the field number to copy when the pattern is found.
/^name/ {a=$(n); print; next} if the line starts with the given pattern, store the given field and print the line.
{print a, $0} otherwise, print the current line with the stored value first.

You can generalize the pattern part into something like:

awk -v n=2 -v pat="name" '$1==pat {a=$(n); print; next} {print a, $0}' file

Awk ifs and variables – cannot pass a variable from one line towards subsequent lines

As pointed out by taliezin, your mistake was to use $ to expand path when printing. Unlike bash or make, awk doesn't use the $ to expand variables names to their value, but to refer to the fields of a line (similar to perl).

So just removing this will make your code work:

BEGIN{
path=""
}
{
    if ($1 ~ /\:/)
        {
        sub(/\:/,"",$1)
        if (substr($1, length,1) ~ /\//)
            {
            path=$1;
            }
        else
            {
            path=$1"/"
            }
        }
    else if (length($0) == 0)
        {}
    else
        print path$1
}

However, this is not really an awkish solution: First of all, there is no need to initialize path in a BEGIN rule, non-defined variables default to "" or 0, depending on context.

Also, any awk script consist of patterns and actions, the former stating when, the latter what to do. You have one action that's always executed (empty pattern), and internally uses (nested) conditionals to decide what to do.

My solution would look like this:

# BEGIN is actually a pattern making the following rule run only once:
# That is, before any input is read.
BEGIN{
  # Split lines into chunks (fields) separated by ":".
  # This is done by setting the field separator (FS) variable accordingly:
# FS=":"  # this would split lines into fields by ":"

  # Additionally, if a field ends with "/",
  # we consider this part of the separator.
  # So fields should be split by a ":" that *might*
  # be predecessed by a "/".
  # This can be done using a regular expression (RE) FS:
  FS="/?:"  # "?" means "the previous character may occur 0 or 1 times"

  # When printing, we want to join the parts of the paths by "/".
  # That's the sole purpose of the output field separator (OFS) variable:
  OFS="/"
}

# First we want to identify records (i.e. in this [default] case: lines),
# that contain(ed) a ":".
# We can do that without any RE matching, since records are
# automatically split into fields separated by ":".
# So asking >>Does the current line contain a ":"?<< is now the same
# as asking >>Does the current record have more than 1 field?<<.
# Luckily (but not surprisingly), the number of fields (NF) variable
# keeps track of this:
NF>1{  # The follwoing action is run only if are >1 fields.

  # All we want to do in this case, is store everything up to the first ":",
  # without the potential final "/".
  # With our FS choice (see above), that's exactly the 1st field:
  path=$1
}

# The printing should be done only for non-empty lines not containing ":".
# In our case, that translates to a record that has neither 0 nor >1 fields:
NF==1{  # The following action is only run if there is exactly 1 field.

  # In this case, we want to print the path varible (no need for a "$" here)
  # followed by the current line, separated by a "/".
  # Since we defined the proper OFS, we can use "," to join output fields:
  print path,$1  # ($1==$0 since NF==1)
}

And that's all. Removing all the comments, shortening the variable name and moving the [O]FS definitions to command line arguments, all you have to write is:

awk -F'/?:' -vOFS=\/ 'NF>1{p=$1}NF==1{print p,$1}' structure-of-home.cnf

Best Answer

Related Solutions

Patterns and file processing

Explanation

Awk ifs and variables – cannot pass a variable from one line towards subsequent lines

Related Question