Uniq a csv file ignoring a column, awk maybe

Given this file (annotations are not part of file, but form part of explanation)…

x,a,001,b,c,d,y
x,a,002,b,c,e,yy
x,bb,003,b,d,e,y
x,c,004,b,d,e,y
x,c,005,b,d,e,y   # nb - dupe of row 4
x,dd,006,b,d,e,y
x,c,007,b,d,e,y   # nb - dupe of row 4 and 5
x,dd,008,b,d,f,y
x,dd,009,b,d,e,y   # nb - dupe of row 6
x,e,010,b,d,f,y

… I would like to derive the following output:

x,a,001,b,c,d,y
x,a,002,b,c,e,yy
x,bb,003,b,d,e,y
x,c,004,b,d,e,y
x,dd,006,b,d,e,y
x,dd,008,b,d,f,y
x,e,010,b,d,f,y

If column 3 were cut from the file, and then uniq were run over the file, then if the remaining rows had their column three value added back in at the right place, then I'd get the above result.

But I'm really struggling, to come up with something that would do this. I'd welcome an opportunity to learn about linux's text processing utilities.

Performance: Files don't look likely to grow to more than 1MB, and there is only 1 file per day.

Target: Debian GNU/Linux 7 amd64, 256MB / Xeon.

Edit: tweaked example as fields are not fixedwidth, and a solution involving uniq --skip-chars=n will not work as far as I can tell.

Uniq a csv file ignoring a column, awk maybe

Best Answer

Related Question

Best Answer

Related Solutions

Adding column in CSV file using awk

Sorting CSV file by first column, ignoring header

Related Question