Display only duplicate lines, ignoring the first x spaces per line

text processing

I have a file with numbered lines. The numbers are taking up the first 7 spaces each line. I want to check the remainder of the line for duplicates and only output the duplicates.

For example, my file might be:

     1 abcde
     2 12345789 
     3 6789   
     4 000000
     5 abcde

In which case I would want my output to be:

     1 abcde
     5 abcde

Output formatting doesn't matter much of course, though it'd be great if the duplicate strings were matched together so I can find them more easily.

I'm using Linux.

Best Answer

sort the file on the second field, and tell GNU uniq to skip the first 7 characters (-s 7), telling it print repeated lines (-D):

$ sort -k2,2 foo | uniq -Ds 7
     1 abcde
     5 abcde
Related Question