Reindexing a large CSV file

awkcsvfilessed

I went through the answers in this helpful thread, but my problem seems to be different enough that I can't think of good answer (at least with sed).

I have a large CSV file (200+ GB) with rows that look like the following:

<alphanumerical_identifier>,<number>

where <alphanumerical_identifier> is unique across the entire file. I would like to create a separate file that replaces the first column by an index, i.e.

<index>,<number>

so that we get:

1, <number>
2, <number>
3, <number>

Can awk generate an increasing index without loading the full file in memory?

Since the index increases monotonically, it may be even better to just drop the index. Would the solution for that be that different?, i.e.:

<number>
<number>
<number>

Best Answer

Not near a terminal to test, but how about the oft-overlooked nl command? Something like:

cut -f 2 -d , original.csv | nl -w 1 -p -s , > numbered.csv

Related Question