Text Processing – Command Line Method to Drop a Column in a CSV File

text processing

Having a file of the following contents:

1111,2222,3333,4444
aaaa,bbbb,cccc,dddd

I seek to get a file equal to the original but lacking a n-th column like, for n = 2 (or may it be 3)

1111,2222,4444
aaaa,bbbb,dddd

or, for n = 0 (or may it be 1)

2222,3333,4444
bbbb,cccc,dddd

A real file can be gigabytes long having tens thousands columns.

As always in such cases, I suspect command line magicians can offer an elegant solution… 🙂

In my actual real case I need to drop 2 first columns, which can be done by dropping a first column twice in a sequence, but I suppose it would be more interesting to generalise a bit.

Best Answer

I believe this is specific to cut from the GNU coreutils:

$ cut --complement -f 3 -d, inputfile
1111,2222,4444
aaaa,bbbb,dddd

Normally you specify the fields you want via -f, but by adding --complement you reverse the meaning, naturally. From 'man cut':

--complement
    complement the set of selected bytes, characters or fields

One caveat: if any of the columns contain a comma, it will throw cut off, because cut isn't a CSV parser in the same way that a spreadsheet is. Many parsers have different ideas about how to handle escaping commas in CSV. For the simple CSV case, on the command line, cut is still the way to go.