Ubuntu – Print unique words, total number of occurrences and sum using `awk`

awkcommand linetext processing

How can I print unique words, number of their occurrences and the sum of their values in the relevant column using a single array in awk?

I'm using awk like:

awk -F, '{sum[$1]+=$2} END{for (x in sum) print x, sum[x]}' inFile

Can I modify the command above to print the total number of occurrences of unique words as well? Something like the below result for the following sample input:

Result (the order of the printed results doesn't matter):

A 2 25 
B 1 12 
C 3 18

Input:

A,15
C,13
C,4
A,10
B,12
C,1

I can add another array to count them separately but I think there should be another way to print it just using the same array.

Is there any index of the array sum which stores the total words seen?

Best Answer

This should do:

awk -F, '{x[$1]["count"]++;x[$1]["sum"]+=$2}END{for(y in x){print y,x[y]["count"],x[y]["sum"]}}' in

Basically you replace the array with a multidimensional array in order to store both the count of the occurences of each unique first field and the sum of their relative second fields.

% cat in
A,15
C,13
C,4
A,10
B,12
C,1
% awk -F, '{x[$1]["count"]++;x[$1]["sum"]+=$2}END{for(y in x){print y,x[y]["count"],x[y]["sum"]}}' in
A 2 25
B 1 12
C 3 18