Sum in block with AWK (restart the sum when the pattern changes)

awk

I have a file like this:

A 100
A 200
A 300 #sum=600
B 400
B 500 #sum=900
A 600
A 700
A 800 #sum=2100

I would like the output to be:

A 600
B 900
A 2100
C sum_of_C
D sum_of_D

I can do that with for, sed, grep and awk.

However because I am learning awk, I would like to write an awk script. So far I have:

if (${NR {print $1}} == ${NR-1 {print $1}}) 
  sum+=$2
  print $0"\t"sum
else
  sum=$2
  print $0"\t"sum

awk -f awkscript file was not successful. What is the solution?

Best Answer

I'm not completely sure what your if is trying to do there. NR is the number of records; use NF for the number of fields, if that's what you're aiming for. You can't put {} blocks in the middle of things like that.

I think what you're aiming for is to compare the value of a field in this line with a field in the previous line, printing out the sum when we reach a new "group" of data. If that's the case, this script will do what you want and I think equates pretty much to what you were aiming for:

{
    if (last && $1 != last) {
        print last, sum
        sum = 0
    }
    sum = sum + $2
    last = $1
}
END {
    print last, sum
}

We make a new variable last to hold the value of the first field ($1) on the previous line. We'll use that to track which group we're looking at.

For every line (because we have { ... } at the top level), we first test whether a) last is set (because we don't want to print anything on the very first line), and b) the value of the first field is different than last. If it is, we print out the value of last, a space (because of ,), and the sum we've calculated. (If you want a tab, use "\t" in quotes like you had)
After printing, we reset sum to zero.
Either way, we add the value of the second field ($2) to sum.
For every line, we save the first field (our group) into last, so we can use it for comparison on the next line.
Finally, we want to print out the last group as well. For that, we use an END { ... } block. It runs right at the end of the program when we run out of data. We print out the sum and the group we're working with just like we did before.

If I run:

awk -f sum.awk < data

with your data file, I get this output:

A 600
B 900
A 2100

as desired.

There are simpler ways to do this, both in awk and otherwise. In particular, we can replace the body above with:

last && $1 != last {
    print last, sum
    sum = 0
}
{
    sum = sum + $2
    last = $1
}

Here we use awk's conditional block syntax rather than an explicit if test: the behaviour of this program is identical to the one above, but it's more idiomatic. It's not hugely different in this example, but it's useful to know about if you're learning awk.

If the file example you gave is literally what it is, with #sum= lines (or similar), you can use this script:

{
    sum = sum + $2
    if (NF == 3) {
        print $1, sum
        sum = 0
    }
}

For every line, this adds the value of the second field to the sum variable. On lines that have exactly three fields (NF == 3), we print out our total, and reset sum to zero.

Related Solutions

Grepping for a block of text with parts that can be optional

This would do it i hope. Events go to events file. And messages go to stdout.

Save this file to myprogram.awk (for example):

#!/usr/bin/awk -f

BEGIN {
   s=0;  ### state. Active when parsing inside an event
   nevent=0;  ### Current event number
   printf "" > "events"
}

# Start of event
/^ *Data control raising event/ {
   s=1;
   dentries=0;
   print "*** Event number: " nevent >> "events"
   nevent++
}

# Standard event line
s==1 {
   print >> "events"
}

# DataChangeEntry line
/^ *==== DataChangeEntry/ {
   dentries ++
}

# End of event
s==1 && /^ *\]\]/ {
   s=0;
   print "" >> "events"
   if(dentries==0){
      print "Warning: Event " nevent " has no Data Entries"
   }
}

END {
   print "Total event count: " nevent
}

You can invoke it in different ways:

myprogram.awk inputfile.txt
awk -f myprogram.awk inputfile.txt

Sample output:

Warning: Event 3 has no Data Entries
Total event count: 3

You can check all the events together in the file called events in working directory.

Filter YAML file content using sed/awk

An sed solution:

sed -nEe '/\[(prod|dev)_env]/!d;N;:loop' -e 's/.*\n//;${p;d;};N;P;/\n\[/D;bloop' hosts.yml

/\[(prod|dev)_env]/!d drops all lines until [prod_env] or [dev_env] ist found
N;:loop adds the next line and starts a loop
inside the loop we remove the first of the two lines with s/.*\n//, because it is either the [...env] line or we already printed it in the last loop cycle
${p;d;} prints the remaining lines if we reached the last line while printing
N;P adds the next line and prints the current one
/\n\[/D looks if the next line starts with a [. In this case the first line in the buffer (already printed) can be discarded and we start over with that [ line
bloop otherwise loop

Instead of adding the next line to the buffer, printing and removing the old one, you can go line by line, but this would require another loop, because you can't start over with D

Best Answer

Related Solutions

Grepping for a block of text with parts that can be optional

Filter YAML file content using sed/awk

Related Question