I want to read parts of a big csv file between rows n and m and between columns p and q.
Is there an easy way to do this easily with the shell? (Are there commands I should read the doc's? otherwise, I'll write a python script)
Ubuntu – Extracting a part of a massive csv file from command line
command linecsvscripts
Best Answer
I had a script that I adjusted (good idea the (N+1)q part!) thanks to @chronitis comment and the SO answer:
Save the file as for example
cut_csv
, make it executable and use asIt can be made fancier by accepting the N,M,P,Q parameters as input etc, but I use it seldomly so I normally simply edit the file.
How it works:
The main command is the following (let's suppose N=10, M=20, P=2, Q=3); the shell substitutes the variables and the last line will become: (1)
Let's start with the first command:
This call sed (stream editor,
man sed
) in no-print mode (-n
) and execute the following commands on the file:p
) the lines between 10 and 20 (this is the10,20p
part)q
) when reading line 21 (21q
) so that discard the rest of the fileThe output of sed is piped (
|
) tocut
:This command (
man cut
) selects fields of a line (and repeat for each line). In this case, I am telling it that the separator between fields (columns) is a commad (-d,
), and to print out the columns between 2 and 3.As another more complex example I often use this one:
This will select row 1 (where I have titles :-)) and rows from 10 to 14 (5 lines); then select columns 1 (time in my data...) and column from 4 to 8. It is really powerful once you get grips with it.
(1) one great way to see what the shell is doing is change the first line (which is called a shebang) like that:
The shell will now print every command it reads and the result of the substitutions: