Linux – How to calculate the sum of the data that have the same ID in the first column

awklinuxperltext processing

I want to sum the values of the rows of my data that have the same ID in the first column. My data looks like

data.txt

Id    a    b    c    d    e
1     1    2    3    4    5
1     2    3    4    5    6
1     3    4    5    6    7
2     4    5    6    7    8
2     5    6    7    8    9
2     6    7    8    9    10
3     7    8    9   10    11
3     8    9    10  11    12
3     9    10   11  12    13
3     10   11   12  13    14
4     11   12   13  14    15
4     12   13   14  15    16
5     13   14   15  16    17
5     14   15   16  17    18

Required results

out.txt

Id    a     b   c   d   e
1     6     9   12  15  18
2     15    18  21  24  27
3     34    38  42  46  50
4     23    25  27  29  31
5     27    29  31  33  35

Best Answer

This GNU awk script should do the job:

$ awk 'NR==1 { size=NF;$1=$1;OFS="\t";print;next } 
{ for(i=2;i<=NF;i++) {id[$1]=$1; record[$1,i-1]+=$i} } 
END { PROCINFO["sorted_in"]="@ind_num_asc"
  for(i in id){ printf("%s\t",i);
    for(j=1;j<size;j++) printf("%s\t",record[i,j]);
    printf("\n");
  }
} ' data.txt > out.txt
$ cat out.txt
Id  a   b   c   d   e
1   6   9   12  15  18  
2   15  18  21  24  27  
3   34  38  42  46  50  
4   23  25  27  29  31  
5   27  29  31  33  35

Edit:

Here is a version that preserves the original row ordering instead of sorting the ids:

$ awk 'NR==1 { size=NF;$1=$1;OFS="\t";print;next }
{ if(o[$1]==0) o[$1]=NR
  for(i=2;i<=NF;i++) {record[$1,i-1]+=$i} }
END { PROCINFO["sorted_in"]="@val_num_asc"
  for(i in o){ printf("%s\t",i)
    for(j=1;j<size;j++) printf("%s\t",record[i,j])
    printf("\n") }
}'

Related Solutions

Awk – Cumulative Sum of Values in Column with Same ID

Increment the count after printing the current line.

awk '{print $1, $2, sum[$1]; sum[$1] += $2}' file

This takes advantage of awk treating undefined variables as the empty string, or (in numeric context) as zero.

If you don't waant the incremental sum 0 printed, use

if ($2 != "") sum[$1] += $2

Best Answer

Related Solutions

Awk – Cumulative Sum of Values in Column with Same ID

Related Question