Awk ifs and variables – cannot pass a variable from one line towards subsequent lines

awk

First of all I'm new to awk so please excuse if it's something simple.

I'm trying to generate a file that contains paths. I'm using for this an ls -LT listing as well as an awk script:

This is an example of the input file:

vagrant@precise64:/vagrant$ cat structure-of-home.cnf

/home/:

vagrant

/home/vagrant:

postinstall.sh

This would be the expected output:

/home/vagrant
/home/vagrant/postinstall.sh

The awk script should do the following:

  1. Check whether the line has a : in it
  2. If yes allocate the string (without :) to a variable ($path in my case)
  3. If the line is empty print nothing
  4. If it's not empty and it does not contain a : print the $path and then the current line $0

Here's the script:

BEGIN{
path=""
}
{
    if ($1 ~ /\:/)
        {
        sub(/\:/,"",$1)
        if (substr($1, length,1) ~ /\//)
            {
            path=$1;
            }
        else
            {
            path=$1"/"
            }
        }
    else if (length($0) == 0)
        {}
    else
        print $path$1
}

The problem is that when I run the script I get the following mess:

vagrant@precise64:/vagrant$ awk -f format_output.awk structure-of-home.cnf
vagrantvagrant
postinstall.shpostinstall.sh

What am I doing wrong please?

Best Answer

As pointed out by taliezin, your mistake was to use $ to expand path when printing. Unlike bash or make, awk doesn't use the $ to expand variables names to their value, but to refer to the fields of a line (similar to perl).

So just removing this will make your code work:

BEGIN{
path=""
}
{
    if ($1 ~ /\:/)
        {
        sub(/\:/,"",$1)
        if (substr($1, length,1) ~ /\//)
            {
            path=$1;
            }
        else
            {
            path=$1"/"
            }
        }
    else if (length($0) == 0)
        {}
    else
        print path$1
}

However, this is not really an awkish solution: First of all, there is no need to initialize path in a BEGIN rule, non-defined variables default to "" or 0, depending on context.

Also, any awk script consist of patterns and actions, the former stating when, the latter what to do. You have one action that's always executed (empty pattern), and internally uses (nested) conditionals to decide what to do.

My solution would look like this:

# BEGIN is actually a pattern making the following rule run only once:
# That is, before any input is read.
BEGIN{
  # Split lines into chunks (fields) separated by ":".
  # This is done by setting the field separator (FS) variable accordingly:
# FS=":"  # this would split lines into fields by ":"

  # Additionally, if a field ends with "/",
  # we consider this part of the separator.
  # So fields should be split by a ":" that *might*
  # be predecessed by a "/".
  # This can be done using a regular expression (RE) FS:
  FS="/?:"  # "?" means "the previous character may occur 0 or 1 times"

  # When printing, we want to join the parts of the paths by "/".
  # That's the sole purpose of the output field separator (OFS) variable:
  OFS="/"
}

# First we want to identify records (i.e. in this [default] case: lines),
# that contain(ed) a ":".
# We can do that without any RE matching, since records are
# automatically split into fields separated by ":".
# So asking >>Does the current line contain a ":"?<< is now the same
# as asking >>Does the current record have more than 1 field?<<.
# Luckily (but not surprisingly), the number of fields (NF) variable
# keeps track of this:
NF>1{  # The follwoing action is run only if are >1 fields.

  # All we want to do in this case, is store everything up to the first ":",
  # without the potential final "/".
  # With our FS choice (see above), that's exactly the 1st field:
  path=$1
}

# The printing should be done only for non-empty lines not containing ":".
# In our case, that translates to a record that has neither 0 nor >1 fields:
NF==1{  # The following action is only run if there is exactly 1 field.

  # In this case, we want to print the path varible (no need for a "$" here)
  # followed by the current line, separated by a "/".
  # Since we defined the proper OFS, we can use "," to join output fields:
  print path,$1  # ($1==$0 since NF==1)
}

And that's all. Removing all the comments, shortening the variable name and moving the [O]FS definitions to command line arguments, all you have to write is:

awk -F'/?:' -vOFS=\/ 'NF>1{p=$1}NF==1{print p,$1}' structure-of-home.cnf
Related Question