I have a file like this:
A 100
A 200
A 300 #sum=600
B 400
B 500 #sum=900
A 600
A 700
A 800 #sum=2100
I would like the output to be:
A 600
B 900
A 2100
C sum_of_C
D sum_of_D
I can do that with for
, sed
, grep
and awk
.
However because I am learning awk
, I would like to write an awk
script. So far I have:
if (${NR {print $1}} == ${NR-1 {print $1}})
sum+=$2
print $0"\t"sum
else
sum=$2
print $0"\t"sum
awk -f awkscript file
was not successful. What is the solution?
Best Answer
I'm not completely sure what your
if
is trying to do there.NR
is the number of records; useNF
for the number of fields, if that's what you're aiming for. You can't put{}
blocks in the middle of things like that.I think what you're aiming for is to compare the value of a field in this line with a field in the previous line, printing out the sum when we reach a new "group" of data. If that's the case, this script will do what you want and I think equates pretty much to what you were aiming for:
We make a new variable
last
to hold the value of the first field ($1
) on the previous line. We'll use that to track which group we're looking at.{ ... }
at the top level), we first test whether a)last
is set (because we don't want to print anything on the very first line), and b) the value of the first field is different thanlast
. If it is, we print out the value oflast
, a space (because of,
), and thesum
we've calculated. (If you want a tab, use"\t"
in quotes like you had)sum
to zero.$2
) tosum
.last
, so we can use it for comparison on the next line.END { ... }
block. It runs right at the end of the program when we run out of data. We print out the sum and the group we're working with just like we did before.If I run:
with your data file, I get this output:
as desired.
There are simpler ways to do this, both in awk and otherwise. In particular, we can replace the body above with:
Here we use awk's conditional block syntax rather than an explicit
if
test: the behaviour of this program is identical to the one above, but it's more idiomatic. It's not hugely different in this example, but it's useful to know about if you're learning awk.If the file example you gave is literally what it is, with
#sum=
lines (or similar), you can use this script:For every line, this adds the value of the second field to the
sum
variable. On lines that have exactly three fields (NF == 3
), we print out our total, and resetsum
to zero.