How to remove duplicate lines in a CSV based on first field, and 1st n chars of 2nd field

csvtext processing

For a 3 column csv file, list.csv, how would you remove subsequent duplicate rows where the 1st field matches, and just the first 3 chars of the 2nd field match? Some rows will have a 2nd field with less than 3 chars.

list.csv:

12,12345,a
12,12345,b
123,12345,a
1234,12,b
1234,12345,a
567,567,a
567,56712,a
567,56734,a
567,6789,a

Expected output:

12,12345,a
123,12345,a
1234,12,b
1234,12345,a
567,567,a
567,6789,a

Best Answer

sort should work as well

 sort -t, -k1,1 -k2.1,2.3 -u <list.csv
 12,12345,a
 123,12345,a
 1234,12,b
 1234,12345,a
 567,567,a
 567,6789,a
Related Question