How to merge first two lines of a csv column-by-column

I have an excel file that I converted to csv. When converted, it looks like the following example (Please note that there are 100+ columns in the csv. This is a minified version):

,Product,"  ",Citty,"   ",Price
,Name," ",Location,"    ",Per Unit
,banana,"   ",CA,"  ",5.7
,apple,"    ",FL,"  ",2.3

I need to write a script that will take the first & second line and "merge" them together based on their comma position:

,Product Name," ""  ",Citty Location,"  ""  ",Price Per Unit
,banana,"   ",CA,"  ",5.7
,apple,"    ",FL,"  ",2.3

I've looked at other questions on here and stack overflow, but the answers don't seem to pertain to this weird column-by-column situation for just the first 2 lines of the file.

As an additional unrelated task, I'd also like to get rid of the empty columns in the csv and fix the spelling error so that it looks like this:

Product Name,City Location,Price Per Unit
banana,CA,5.7
apple,FL,2.3

(The csv currently has a tab surrounded by quotes between every actual column of data except for the first column, which is just empty followed by a comma).

I will be receiving the csv with the spelling error multiple times, so I would like to programmatically fix the error in the script. Please also note that the columns may not always be in the order shown above, so I need to dynamically check each column name for the error during the script.

$ awk -F, 'NR<2{split(gensub(/Citty/,"City","g",$0),a,FS)}NR==2{for(b=2;b<=NF;b+=2){c=c a[b]" "$b","}print gensub(/,$/,"",1,c)}NR>2{print gensub(/(^,|" *",)/,"","g",$0)}' inp Product Name,City Location,Price Per Unit banana,CA,5.7 apple,FL,2.3 $

$ awk -F, ' > NR<2{split(gensub(/Citty/,"City","g",$0),a,FS)} > NR==2{for(b=2;b<=NF;b+=2){c=c a[b]" "$b","}print gensub(/,$/,"",1,c)} > NR>2{print gensub(/(^,|" *",)/,"","g",$0)}' inp Product Name,City Location,Price Per Unit banana,CA,5.7 apple,FL,2.3 $

Best Answer

Try this

Same code is more readable if split across a few lines :

If 1st line, split the line into array elements within a. Fix the Citty->City typo.

If 2nd line, starting with the 2nd column, print the corresponding column from 1st line together with this column. Repeat for each column, going in 2 column increments. Strip the trailing ,.

After 2nd line, replace any leading , or any "<spaces>", with an empty string and then print the result.

Tested ok on GNU Awk 4.0.2

Try it online!

Best Answer

Related Solutions

Lum – merge csv files by first column

Related Question