So I have a set of text that contains both the file names and an associated number. Currently it looks like this:
RR0.out -1752.142111
RR1.out -1752.141887
RR2.out -1752.142111
RR3.out -1752.140319
RR4.out -1752.140564
RR5.out -1752.138532
RR6.out -1752.138493
RR7.out -1752.138493
RR8.out -1752.138532
I want to write a script that will remove rows that are have duplicate second values. So that the output would be:
RR0.out -1752.142111
RR1.out -1752.141887
RR3.out -1752.140319
RR4.out -1752.140564
RR5.out -1752.138532
RR6.out -1752.138493
RR8.out -1752.138532
I have seen both sort -u
and uniq
used for this, but I cannot figure out how to remove lines that aren't exactly identical (which can be done with uniq
but not sort
) AND not adjacent to one another (which can be done with sort
but not uniq
).
Can anyone give me any suggestions?
So far the below code does not give me what I want.
sort -t ' ' -k 2n file > file2
uniq -f 1 file2 > file3
Best Answer
sort -u
will sort the output and produce only unique values,-k2
will do the sorting/uniquing based on the second column.In order to reorder the output based on the filenames in column one you can pipe it back into sort: