I have data in a text file of the form:
1
1 2
1 2
1 4
1 6
2
2 1
2 2
2 3
2 4
3
3 1
3 5
3 9
3 11
For rows with the same ID (1st column) I want to add a column which is sum all the values in column 2 upto the previous row. Which a desired output:
1
1 2
1 2 2
1 4 4
1 6 8
2
2 1 0
2 2 1
2 3 3
2 4 6
3
3 1 0
3 5 1
3 9 6
3 11 14
Which I am close to achieving with:
awk -v OFS='' 'NR == 1 {
next
}
{
print $0, (NR > 1 && p1 == $1 ? " " (sum+=p2) : "")
}
{
p1 = $1
p2 = $2
}' input > output
However this is summing ALL the values in column 2, not just values with the same ID. So The output is correct for ID=1, but obviously gets worse:
1 2
1 2 2
1 4 4
1 6 8
2
2 1 8
2 2 9
2 3 11
2 4 14
3
3 1 14
3 5 15
3 9 20
3 11 29
How can I alter my sum to only include the correct section? (rows with the same ID)
Best Answer
Increment the count after printing the current line.
This takes advantage of awk treating undefined variables as the empty string, or (in numeric context) as zero.
If you don't waant the incremental sum
0
printed, use