AWK, Sum of category

awkcsv-simple

I have tons of CSV files with similar content. The values are usually comma separated and they look like this.

product_a,  domestic,   500
product_a,  abroad,     15
product_b,  domestic,   313
product_b,  abroad,     35
product_c,  domestic,   411
product_c,  abroad,     84
product_d,  domestic,   25
product_d,  abroad,     2
...

What I'm trying to accomplish, with AWK (because I believe SED is not the right tool for such operations, but I'm relatively new Linux user…) is sum of each product (Column 1), insterted as a column nr 2. I can do something like this

awk -F, '{a[$1]+=$3;}END{for (i in a)print i", "a[i];}' filename

to get those values (sums)

product_a,  515
product_b,  348
product_c,  495
product_d,  27
...

but still I have no idea how to insert them as a second column in the original file, in form like this:

product_a,  515, domestic,  500
product_a,  515, abroad,    15
product_b,  348, domestic,  313
product_b,  348, abroad,    35
product_c,  495, domestic,  411
product_c,  495, abroad,    84
product_d,  27,  domestic,  25
product_d,  27,  abroad,    2
...

I've been using a little bit of sed and awk lately, but my tries usually gets me Error (like: attempt to use a scalar value as array).

Order of rows are not my concern, but I assume that I'll able to use the answer as a batch file command.

$ for f in *.csv; do
  That Shiny Enigmatic Command > tmp && mv tmp $f
  done

EDIT

Thanks to @KM. I've get to the place, where I can do what I want in 3 steps.

1 step:

$ for f in *.csv; do 
awk -F, '{a[$1]+=$3;}END{for (i in a)print i", "a[i];}' $f | sort > sum$f
done

2nd step:

$ for f in [^sum]*.csv; do 
join -t ',' $f sum$f | awk -F, '{print $1"," $4"," $2"," $3}' > tmp && mv tmp $f; 
done

To eventually just rm sum*.*.
Is there is a way to execute it as one command from terminal? Or outside of it?

Best Answer

Save the sum to a file called sum, sorted

awk -F, '{a[$1]+=$3;}END{for (i in a)print i", "a[i];}' filename | sort > sum

cat sum 
product_a, 515
product_b, 348
product_c, 495
product_d, 27

Join the two files, first column of first file with first column of second (think "keys"); pipe it to awk and print reordered columns, using , as field separator (-F) and as the Output Field Separator (-OFS)

join -t ','  -1 1 -2 1 filename sum | awk -F, -OFS=, {'print $1,$4,$2,$3}'

product_a,  515,  domestic,  500
product_a,  515,  abroad,    15
product_b,  348,  domestic,  313
product_b,  348,  abroad,    35
product_c,  495,  domestic,  411
product_c,  495,  abroad,    84
product_d,  27,   domestic,  25
product_d,  27,   abroad,    2
Related Question