Shell Script – Remove Duplicate Values Not on Identical Lines

scriptingshell-scriptsorttext processinguniq

So I have a set of text that contains both the file names and an associated number. Currently it looks like this:

RR0.out -1752.142111    
RR1.out -1752.141887    
RR2.out -1752.142111    
RR3.out -1752.140319    
RR4.out -1752.140564    
RR5.out -1752.138532    
RR6.out -1752.138493    
RR7.out -1752.138493    
RR8.out -1752.138532

I want to write a script that will remove rows that are have duplicate second values. So that the output would be:

RR0.out -1752.142111    
RR1.out -1752.141887    
RR3.out -1752.140319    
RR4.out -1752.140564    
RR5.out -1752.138532    
RR6.out -1752.138493    
RR8.out -1752.138532    

I have seen both sort -u and uniq used for this, but I cannot figure out how to remove lines that aren't exactly identical (which can be done with uniq but not sort) AND not adjacent to one another (which can be done with sort but not uniq).
Can anyone give me any suggestions?

So far the below code does not give me what I want.

sort -t ' ' -k 2n file > file2  
uniq -f 1 file2 > file3 

Best Answer

$ sort -uk2 file
RR6.out -1752.138493
RR8.out -1752.138532
RR5.out -1752.138532
RR3.out -1752.140319
RR4.out -1752.140564
RR1.out -1752.141887
RR0.out -1752.142111

sort -u will sort the output and produce only unique values, -k2 will do the sorting/uniquing based on the second column.

In order to reorder the output based on the filenames in column one you can pipe it back into sort:

$ sort -uk2 file | sort -k1
RR0.out -1752.142111
RR1.out -1752.141887
RR3.out -1752.140319
RR4.out -1752.140564
RR5.out -1752.138532
RR6.out -1752.138493
RR8.out -1752.138532
Related Question