I want to write a script that merges contents of several .csv files in one .csv file, i.e appends columns of all other files to the columns of first file. I had tried doing so using a "for" loop but was not able to proceed with it.
Does anyone know how to do this in Linux?
Best Answer
Here's a perl script that reads in each line of each file specified on the command line and appends it to elements in the array (
@csv
). When there's no more input, it prints out each element of@csv
.The
.csv
files will be appended in the order that they are listed on the command line.WARNING: This script assumes that all input files have the same number of lines. Output will likely be unusable if any file has a different number of lines from any of the others.
Given the following input files:
it will produce the following output:
OK, now that you've read this far it's time to admit that this doesn't do anything that
paste -d, *.csv
doesn't also do. So why bother with perl?paste
is quite inflexible. If your data is exactly right for whatpaste
does, you're good - it's perfect for the job and very fast. If not, it's completely useless to you.There are any number of ways a perl script like this could be improved (e.g. handling files of different lengths by counting the number of fields for each file and adding the correct number of empty fields to
@csv
for each of the file(s) which are missing lines. or at least detecting different lengths and exiting with an error) but this is a reasonable starting point if more sophisticated merging is required.BTW, this uses a really simple algorithm and stores the entire contents of all input files in memory (in
@csv
) at once. For files up to a few MB each on a modern system, that's not unreasonable. If, however, you are processing HUGE .csv files, a better algorithm would be to: