Bash commands/script to remove a line from CSV with duplicate in column

awkbashsed

I have a lot CSV files that I have combined. However, there are duplicates, but the entire line is not duplicated. I do have a column that I want to use as the criteria to search for a duplicate. And if there is a duplicate in that column from the entire column, then delete the rows that contain the duplicates in the columns until you have all unique values in this column.

Does anyone know the best way to accomplish this in Bash, sed or awk?

Best Answer

awk -F, '!seen[$1]++'

$1 is the first column, change as appropriate; you can use multiple columns separated by commas ([$1,$3]), or $0 for the whole row.

Related Question