Lum – merge csv files by first column

awkcolumnscsvjoin;text processing

I have 3 csv files like this.

csv 1:

1,aaaa,bbb,2014-04-01
2,qwe,rty,2014-04-03
3,zxc,cvb,2014-04-05

csv 2:

2,j,k,2014-04-01
3,a,s,2014-04-04
5,g,h,2014-04-08

csv 3:

2,a,s,d,f,g,2014-04-01
3,d,f,g,h,j,2014-04-06
4,c,v,b,n,m,2014-04-09

How can I merge all by the first column?

SELECT * FROM csv1
JOIN csv2 where csv1[0]= csv2[0] --[0] is the position of the first column

The output should be:

 csv1 fields     | csv2 fields |  csv4 fields

 2,qwe,rty,2014-04-03,a,s,2014-04-04,a,s,d,f,g,2014-04-01
 3,zxc,cvb,2014-04-05,g,h,2014-04-08,d,f,g,h,j,2014-04-06

Best Answer

You can do this entirely with POSIX-specified features of join.

join -t, csv[12] | join -t, - csv3

Using your csv1, csv2 and csv3 files as posted, that gives:

$ join -t, csv[12] | join -t, - csv3
2,qwe,rty,2014-04-03,j,k,2014-04-01,a,s,d,f,g,2014-04-01
3,zxc,cvb,2014-04-05,a,s,2014-04-04,d,f,g,h,j,2014-04-06

Related Solutions

How to merge first two lines of a csv column-by-column

Try this

$ awk -F, 'NR<2{split(gensub(/Citty/,"City","g",$0),a,FS)}NR==2{for(b=2;b<=NF;b+=2){c=c a[b]" "$b","}print gensub(/,$/,"",1,c)}NR>2{print gensub(/(^,|" *",)/,"","g",$0)}' inp
Product Name,City Location,Price Per Unit
banana,CA,5.7
apple,FL,2.3
$

Same code is more readable if split across a few lines :

$ awk -F, '
> NR<2{split(gensub(/Citty/,"City","g",$0),a,FS)}
> NR==2{for(b=2;b<=NF;b+=2){c=c a[b]" "$b","}print gensub(/,$/,"",1,c)}
> NR>2{print gensub(/(^,|" *",)/,"","g",$0)}' inp
Product Name,City Location,Price Per Unit
banana,CA,5.7
apple,FL,2.3
$

If 1st line, split the line into array elements within a. Fix the Citty->City typo.

If 2nd line, starting with the 2nd column, print the corresponding column from 1st line together with this column. Repeat for each column, going in 2 column increments. Strip the trailing ,.

After 2nd line, replace any leading , or any "<spaces>", with an empty string and then print the result.

Tested ok on GNU Awk 4.0.2

Try it online!

Best Answer

Related Solutions

How to merge first two lines of a csv column-by-column

Related Question