Maximum of column with matching id

awk

I have a file with the following fields

NM001 rp1 210
NM003 rp1 220
NM005 rag 200
NM004 rag 100

I would like to have an output that for every change in column2, find maximum in column4 but also report corresponding column1. So far I found a solution that report column2 and 3 but not column1 because the number in column1 is different. I like to use awk please.

The output should be

NM003 rp1 220
NM005 rag 200

Best Answer

You can do this quite easily with awk.

You need to keep track of the maximum value for each value in the second column, and you also want to keep the whole line for each which has the max value:

awk '
    $3 > maxvals[$2] {lines[$2]=$0; maxvals[$2]=$3}
    END { for (tag in lines) print lines[tag] }
'

The output is not sorted, so pipe into sort if you need a particular sorting.

Related Question