Awk – Cumulative Sum of Values in Column with Same ID

awk

I have data in a text file of the form:

1     
1 2 
1 2  
1 4  
1 6  
2     
2 1 
2 2  
2 3 
2 4  
3     
3 1 
3 5   
3 9  
3 11  

For rows with the same ID (1st column) I want to add a column which is sum all the values in column 2 upto the previous row. Which a desired output:

1     
1 2   
1 2   2
1 4   4
1 6   8
2     
2 1   0
2 2   1   
2 3   3
2 4   6
3     
3 1   0
3 5   1
3 9   6
3 11  14

Which I am close to achieving with:

awk -v OFS='' 'NR == 1 {
   next
}
{
   print $0, (NR > 1 && p1 == $1 ? " " (sum+=p2) : "")
}
{
   p1 = $1
   p2 = $2
}' input > output

However this is summing ALL the values in column 2, not just values with the same ID. So The output is correct for ID=1, but obviously gets worse:

1  2
1  2   2
1  4   4
1  6   8
2
2  1   8
2  2   9
2  3   11
2  4   14
3
3  1   14
3  5   15
3  9   20
3  11  29

How can I alter my sum to only include the correct section? (rows with the same ID)

Best Answer

Increment the count after printing the current line.

awk '{print $1, $2, sum[$1]; sum[$1] += $2}' file
1
1 2 0
1 2 2
1 4 4
1 6 8
2
2 1 0
2 2 1
2 3 3
2 4 6
3
3 1 0
3 5 1
3 9 6
3 11 15

This takes advantage of awk treating undefined variables as the empty string, or (in numeric context) as zero.

If you don't waant the incremental sum 0 printed, use

if ($2 != "") sum[$1] += $2
Related Question