Shell – Merging Multiple CSV Files into a Single CSV File

csvshelltext processing

I want to write a script that merges contents of several .csv files in one .csv file, i.e appends columns of all other files to the columns of first file. I had tried doing so using a "for" loop but was not able to proceed with it.

Does anyone know how to do this in Linux?

Best Answer

Here's a perl script that reads in each line of each file specified on the command line and appends it to elements in the array (@csv). When there's no more input, it prints out each element of @csv.

The .csv files will be appended in the order that they are listed on the command line.

WARNING: This script assumes that all input files have the same number of lines. Output will likely be unusable if any file has a different number of lines from any of the others.

#!/usr/bin/perl

use strict;

my @csv=();

foreach (@ARGV) {
  my $linenum=0;

  open(F,"<",$_) or die "couldn't open $_ for read: $!\n";

  while (<F>) {
    chomp;
    $csv[$linenum++] .= "," . $_;
  };

  close(F);
};

foreach (@csv) {
  s/^,//;   # strip leading comma from line
  print $_,"\n";
};

Given the following input files:

==> 1.csv <==
1,2,3,4
1,2,3,4
1,2,3,4
1,2,3,4

==> 2.csv <==
5,6,7,8
5,6,7,8
5,6,7,8
5,6,7,8

==> 3.csv <==
9,10,11,12
9,10,11,12
9,10,11,12
9,10,11,12

it will produce the following output:

$ ./mergecsv.pl *.csv
1,2,3,4,5,6,7,8,9,10,11,12
1,2,3,4,5,6,7,8,9,10,11,12
1,2,3,4,5,6,7,8,9,10,11,12
1,2,3,4,5,6,7,8,9,10,11,12

OK, now that you've read this far it's time to admit that this doesn't do anything that paste -d, *.csv doesn't also do. So why bother with perl? paste is quite inflexible. If your data is exactly right for what paste does, you're good - it's perfect for the job and very fast. If not, it's completely useless to you.

There are any number of ways a perl script like this could be improved (e.g. handling files of different lengths by counting the number of fields for each file and adding the correct number of empty fields to @csv for each of the file(s) which are missing lines. or at least detecting different lengths and exiting with an error) but this is a reasonable starting point if more sophisticated merging is required.

BTW, this uses a really simple algorithm and stores the entire contents of all input files in memory (in @csv) at once. For files up to a few MB each on a modern system, that's not unreasonable. If, however, you are processing HUGE .csv files, a better algorithm would be to:

open all the input files and, while there's still input to read:
- read a line from each file
- append the lines (in @ARGV order)
- print the appended line

Best Answer

Related Solutions

Command-Line – Robust Tools for Processing CSV Files

Related Question