Ubuntu – Sorting values and grepping the best score (highest number)

bashcommand linegrepsort

I have a file that looks like this:

    7  C00000002 score:  -41.156 nathvy =  49 nconfs =         2251
    8  C00000002 score:  -39.520 nathvy =  49 nconfs =         3129
    9  C00000004 score:  -38.928 nathvy =  24 nconfs =          150
   10  C00000002 score:  -38.454 nathvy =  49 nconfs =         9473
   11  C00000004 score:  -37.704 nathvy =  24 nconfs =          156
   12  C00000001 score:  -37.558 nathvy =  41 nconfs =           51
    2  C00000002 score:  -48.649 nathvy =  49 nconfs =         3878
    3  C00000001 score:  -44.988 nathvy =  41 nconfs =         1988
    4  C00000002 score:  -42.674 nathvy =  49 nconfs =         6740
    5  C00000002 score:  -42.453 nathvy =  49 nconfs =         4553
    6  C00000002 score:  -41.829 nathvy =  49 nconfs =         7559

My second column are some IDs that are not sorted here, some of them are repeating, such as (C00000001) for example. All of them have a different number assigned followed by score: (number most often starts with -).

What I would like to do is:

1) read second column (non sorted IDs) and to always pick the first one that appears. So in case of C00000001 it would pick the on with score : -37.558.

2) now when I have unique values presented, I would like to sort them based on the number after score:, meaning the most negative number to be on the first position while the most positive one to be on the last position.

I would like to have output printed out the same way as my input file (same structure).

Best Answer

$ sort -k2,2 -u < filename | sort -k4,4n

7  C00000002 score:  -41.156 nathvy =  49 nconfs =         2251
9  C00000004 score:  -38.928 nathvy =  24 nconfs =          150
12 C00000001 score:  -37.558 nathvy =  41 nconfs =           51

Explanation:

  1. sort -k2,2 -u: sorts the lines based on second column and does not change the order of them (cause they're basically the same value) and keep the first one.
  2. sort -k4,4n: sort numerically based on the scores (there is no need for -r to reverse it).
Related Question