Bash – Identifying duplicate fields and print both with awk

awkbashduplicateshell-script

I have a file with multiple columns and want to identify those where specific column values (cols 3-6) have been duplicated.

The following code finds the duplicates but I want to display both instances, not just the second. The other column values (cols 1, 2 and 7+) can be different between the 2 lines hence the need for me to view both instances.

awk 'seen[$3, $4, $5, $6]++ == 1' filename

Best Answer

uniq is the correct tool for that:

uniq -D -f2 file

Where:

-D - prints all duplicates
-f2 - avoid comparing the first 2 fields

Edit: If the fields 7 and above are not to be compared, you need awk:

awk 'n=x[$3,$4,$5,$6]{print n"\n"$0;} {x[$3,$4,$5,$6]=$0;}' file

The array item x[] (columns 3-6) is checked. If it's already set run the part in {...} (in the same statement the n variable is set to the value of that array item)
In the brackets {...}: The n variable and the current line $0 are printed.
Then we set the x[] array item for the next iteration to the current line contents, for later comparsion.

Related Solutions

Bash commands/script to remove a line from CSV with duplicate in column

awk -F, '!seen[$1]++'

$1 is the first column, change as appropriate; you can use multiple columns separated by commas ([$1,$3]), or $0 for the whole row.

Bash – Keeping First Instance of Duplicates

sort itself should suffice. First sort such that rows are "grouped" by field range 3-6, records within each group further ordered by fields 5 and 1. Pipe this to sort -u on 3-6, this disables last-resort comparison and returns the first record from each 3-6 group. Finally, pipe this to sort, this time by fields 5 and 1

sort -k3,6 -k5,5r -k1,1r file | sort -k3,6 -u | sort -k5,5r -k1,1r
A B C D E F G
1 2 T TACA A 3 2 Q
9 3 A C 9 3 P
8 3 I R 8 2 Q

Best Answer

Related Solutions

Bash commands/script to remove a line from CSV with duplicate in column

Bash – Keeping First Instance of Duplicates

Related Question