How to print top five highest numbers from a column

awkperlsort

I have a text file with four columns. I need to read till end of file and print the highest number from column3 (top 5 values) along with column 1.

input.txt

xm|340034177|ref|RT_235820.1|   139697  192 0
xm|161622288|ref|RT_340093.1|   153819  2607    0
xm|75755638|ref|RT_557407.1|    153821  1937    0
xm|108773031|ref|RT_678101.1|   161452  1688    0
xm|30352011|ref|RT_784766.1|    150568  105 0

output.txt

xm|161622288|ref|RT_340093.1|   2607
xm|75755638|ref|RT_557407.1|    1937
xm|108773031|ref|RT_678101.1|   1688
xm|340034177|ref|RT_235820.1|   192
xm|30352011|ref|RT_784766.1|    105

Best Answer

sort -k3n,3 filename | tail -5 | cut -d " " -f1,6-7

The above command will sort the file on the 3rd field. Now, I am piping this output to the tail command to print the top 5 numbers in the 3rd column. However, if you need only the first column and this 3rd column in the output, you can pipe the output to cut command.

Testing

cat filename

T_235820.1|   139697  192 0
xm|161622288|ref|RT_340093.1|   153819  2607    0
xm|75755638|ref|RT_557407.1|    153821  1937    0
xm|108773031|ref|RT_678101.1|   161452  1688    0
xm|30352011|ref|RT_784766.1|    150568  105 0
T_235820.1|   139697  192 0
xm|161622288|ref|RT_340093.1|   153819  607    0
xm|75755638|ref|RT_557407.1|    153821  937    0
xm|108773031|ref|RT_678101.1|   161452  1881    0
xm|30352011|ref|RT_784766.1|    150568  1051 0

Now, I run the above command on this file.

sort -k3n,3 filename | tail -5 | cut -d " " -f1,6-7

The output that I get is,

xm|30352011|ref|RT_784766.1|  1051
xm|108773031|ref|RT_678101.1| 1688 
xm|108773031|ref|RT_678101.1| 1881 
xm|75755638|ref|RT_557407.1|  1937
xm|161622288|ref|RT_340093.1| 2607 

EDIT

You can add the -g flag for floating point and negative numbers as well in case if you have any in your file. The command would look like,

sort -k3ng,3 filename | tail -5 | cut -d " " -f1,6-7
Related Question