I have tons of CSV files with similar content. The values are usually comma separated and they look like this.
product_a, domestic, 500
product_a, abroad, 15
product_b, domestic, 313
product_b, abroad, 35
product_c, domestic, 411
product_c, abroad, 84
product_d, domestic, 25
product_d, abroad, 2
...
What I'm trying to accomplish, with AWK (because I believe SED is not the right tool for such operations, but I'm relatively new Linux user…) is sum of each product (Column 1), insterted as a column nr 2. I can do something like this
awk -F, '{a[$1]+=$3;}END{for (i in a)print i", "a[i];}' filename
to get those values (sums)
product_a, 515
product_b, 348
product_c, 495
product_d, 27
...
but still I have no idea how to insert them as a second column in the original file, in form like this:
product_a, 515, domestic, 500
product_a, 515, abroad, 15
product_b, 348, domestic, 313
product_b, 348, abroad, 35
product_c, 495, domestic, 411
product_c, 495, abroad, 84
product_d, 27, domestic, 25
product_d, 27, abroad, 2
...
I've been using a little bit of sed and awk lately, but my tries usually gets me Error (like: attempt to use a scalar value as array).
Order of rows are not my concern, but I assume that I'll able to use the answer as a batch file command.
$ for f in *.csv; do
That Shiny Enigmatic Command > tmp && mv tmp $f
done
EDIT
Thanks to @KM. I've get to the place, where I can do what I want in 3 steps.
1 step:
$ for f in *.csv; do
awk -F, '{a[$1]+=$3;}END{for (i in a)print i", "a[i];}' $f | sort > sum$f
done
2nd step:
$ for f in [^sum]*.csv; do
join -t ',' $f sum$f | awk -F, '{print $1"," $4"," $2"," $3}' > tmp && mv tmp $f;
done
To eventually just rm sum*.*
.
Is there is a way to execute it as one command from terminal? Or outside of it?
Best Answer
Save the sum to a file called
sum
, sortedJoin the two files, first column of first file with first column of second (think "keys"); pipe it to
awk
and print reordered columns, using,
as field separator (-F
) and as the Output Field Separator (-OFS
)