Remove Duplicate Entries from CSV File – Text Processing Guide

filestext processing

I've got a [csv] file with duplicate datum reprinted ie the same data printed twice. I've tried using sort's uniq
by sort myfile.csv | uniq -u however there is no change in the myfile.csv, also I've tried sudo sort myfile.csv | uniq -u but no difference.

So currently my csv file looks like this

a
a
a
b
b
c
c
c
c
c

I would like to look like it

a
b
c

Best Answer

The reason the myfile.csv is not changing is because the -u option for uniq will only print unique lines. In this file, all lines are duplicates so they will not be printed out.

However, more importantly, the output will not be saved in myfile.csv because uniq will just print it out to stdout (by default, your console).

You would need to do something like this:

$ sort -u myfile.csv -o myfile.csv

The options mean:

  • -u - keep only unique lines
  • -o - output to this file instead of stdout

You should view man sort for more information.

Related Question