Awk – Cumulative Sum of Values in Column with Same ID

awk

I have data in a text file of the form:

For rows with the same ID (1st column) I want to add a column which is sum all the values in column 2 upto the previous row. Which a desired output:

Which I am close to achieving with:

awk -v OFS='' 'NR == 1 {
   next
}
{
   print $0, (NR > 1 && p1 == $1 ? " " (sum+=p2) : "")
}
{
   p1 = $1
   p2 = $2
}' input > output

However this is summing ALL the values in column 2, not just values with the same ID. So The output is correct for ID=1, but obviously gets worse:

How can I alter my sum to only include the correct section? (rows with the same ID)

Best Answer

Increment the count after printing the current line.

awk '{print $1, $2, sum[$1]; sum[$1] += $2}' file

This takes advantage of awk treating undefined variables as the empty string, or (in numeric context) as zero.

If you don't waant the incremental sum 0 printed, use

if ($2 != "") sum[$1] += $2

Related Solutions

Shell Script – Adding a Column of Values in a Tab Delimited File

You can use a one-liner loop like this:

for f in file1 file2 file3; do sed -i "s/$/\t$f/" $f; done

For each file in the list, this will use sed to append to the end of each line a tab and the filename.

Explanation:

Using the -i flag with sed to perform a replacement in-place, overwriting the file
Perform a substitution with s/PATTERN/REPLACEMENT/. In this example PATTERN is $, the end of the line, and REPLACEMENT is \t (= a TAB), and $f is the filename, from the loop variable. The s/// command is within double-quotes so that the shell can expand variables.

Linux Shell Text Processing – How to Sum Up Values of Each Two Rows Across Their Line in Linux

sed '
    N                                                       #append next line
    s/$/))/                                                 #add `))` to end
    s/\(\S*\s*\)\(.*\)\n\1/printf "%016d\n" \$((10#\2+10#/  #check Nos, form line
    t                                                       #to end if Nos equal
    s/))$//                                                 #remove `))`
    D                                                       #delete 1st line
    ' file |
bash

Regarding 45000 digits number please note that maximum number which bash can handle is

/* Minimum and maximum values a `signed long int' can hold.  */
#  if __WORDSIZE == 64
#   define LONG_MAX 9223372036854775807L
#  else
#   define LONG_MAX 2147483647L
#  endif

[ 1 ] /usr/include/limits.h

Best Answer

Related Solutions

Shell Script – Adding a Column of Values in a Tab Delimited File

Linux Shell Text Processing – How to Sum Up Values of Each Two Rows Across Their Line in Linux

Related Question