Display only duplicate lines, ignoring the first x spaces per line

text processing

I have a file with numbered lines. The numbers are taking up the first 7 spaces each line. I want to check the remainder of the line for duplicates and only output the duplicates.

For example, my file might be:

In which case I would want my output to be:

     1 abcde
     5 abcde

Output formatting doesn't matter much of course, though it'd be great if the duplicate strings were matched together so I can find them more easily.

I'm using Linux.

Best Answer

sort the file on the second field, and tell GNU uniq to skip the first 7 characters (-s 7), telling it print repeated lines (-D):

$ sort -k2,2 foo | uniq -Ds 7
     1 abcde
     5 abcde

Related Solutions

Shell – Print nth line before the matched line, Matching line and nth line from the matched line

Here's a perl one-liner:

$ perl -ne '$n=3;push @lines,$_; END{for($i=0;$i<=$#lines;$i++){
  if ($lines[$i]=~/blah/){
    print $lines[$i-$n],$lines[$i],$lines[$i+$n]}}
 }' example.txt 
b
blah
g
f
blah
g

To change the number of surrounding lines, change $n=3; to $n=N where N is the desired number. To change the matched pattern, change if ($lines[$i]=~/blah/) to if ($lines[$i]=~/PATTERN/).

If the numbers are actually part of the file, you can do something like this:

$ perl -ne '$n=3;push @lines,$_; END{for($i=0;$i<=$#lines;$i++){
      if ($lines[$i]=~/blah/){
        print $lines[$i-$n],$lines[$i],$lines[$i+$n]}}
     }' example.txt | perl -pne 's/\d+/$./'
1. b
2. blah
3. g
4. f
5. blah
6. g

Awk from different lines

awk solution:

awk 'v && NR==n{ print $6,v > "result.txt" }/^!/{ v=$5; n=NR+1 }' file

<condition1> { <statement> ... }<condition2>{ <statement> ... } - conditions with respective statements will be evaluated consecutively
/^!/{ v=$5; n=NR+1 } - on encountering line starting with ! - capture the 5th field value $5 and plan the next line number NR+1 (assigning to variable n)
v && NR==n - if we have the 1st crucial number v and the current record number NR is the needed "next line number" n - print the values into file result.txt

The result.txt file contents:

188 -9744.24963670
140 -9744.30001681
155 -9744.33953891
164 -9744.36584201
154 -9744.37925372
153 -9744.39185493
160 -9744.39836617

Best Answer

Related Solutions

Shell – Print nth line before the matched line, Matching line and nth line from the matched line

Awk from different lines

Related Question