I have a large file in the following format:
2 1019 0 12
2 1019 3 0
2 1021 0 2
2 1021 2 0
2 1022 4 5
2 1030 0 1
2 1030 5 0
2 1031 4 4
If the values in column 2 match, I want to sum the values in column 3 and 4 of both lines, else just the sum of the values in the unique line.
So the output I am hoping for would look like this:
2 1019 15
2 1021 4
2 1022 9
2 1030 6
2 1031 8
I am able to sort files according to column 2 with awk
or sort
and sum the last columns with awk
, but only for individual lines not for two lines where column 2 matches.
Best Answer
I would do this in Perl:
Or awk:
If you want the output sorted according to the second column you could just pipe to
sort
:Note that both solutions include the 1st column as well. The idea is to use the first and second columns as keys to a hash (in perl) or an associative array (in awk). The key in each solution is
column1 column2
so if two lines have the same column two but a different column one, they will be grouped separately: