I have the following file:
6180,6180,0,1,,1,0,1,1,0,0,0,0,0,0,0,0,4326,4326,,0.440000,
6553,6553,0,1,,1,0,1,1,0,0,0,0,1,0,1,0,4326,4326,,9.000000,
1297,1297,0,0,,0,0,1,0,0,0,0,0,1,0,1,0,1707,1707,,7.000000,
6598,6598,0,1,,1,0,1,1,0,0,0,1,0,0,0,0,1390,1390,,0.730000,
4673,4673,0,1,,1,0,1,1,0,0,0,0,0,0,0,0,1707,1707,,0.000000,
I need an awk command that print out the maximum value of $21 for $18.
the desired output will look like:
6553,6553,0,1,,1,0,1,1,0,0,0,0,1,0,1,0,4326,4326,,9.000000,
1297,1297,0,0,,0,0,1,0,0,0,0,0,1,0,1,0,1707,1707,,7.000000,
6598,6598,0,1,,1,0,1,1,0,0,0,1,0,0,0,0,1390,1390,,0.730000,
I got this result, but using the sort command, as below:
sort -t, -k18,18n -k21,21nr | awk -F"," '!a[$18]++'
while I am looking to do it with single awk command.
Please advice,
Best Answer
I don't see why you would want to do it in a single
awk
command, what you have seems perfectly fine. Anyway, here's one way:The idea is very simple. We have two arrays,
max
has$18
as a key and$21
as a value. For every line, if the saved value for$18
is smaller than$21
or if there is no value stored for$18
, then we store the current line ($0
) as the value for$18
in arrayline
. Finally, in theEND{}
block, we print arrayline
.Note that the script above treats
$18
as a string. Therefore,001
and1
will be considered different strings.