How to merge first two lines of a csv column-by-column

csvtext processing

I have an excel file that I converted to csv. When converted, it looks like the following example (Please note that there are 100+ columns in the csv. This is a minified version):

,Product,"  ",Citty,"   ",Price
,Name," ",Location,"    ",Per Unit
,banana,"   ",CA,"  ",5.7
,apple,"    ",FL,"  ",2.3

I need to write a script that will take the first & second line and "merge" them together based on their comma position:

,Product Name," ""  ",Citty Location,"  ""  ",Price Per Unit
,banana,"   ",CA,"  ",5.7
,apple,"    ",FL,"  ",2.3

I've looked at other questions on here and stack overflow, but the answers don't seem to pertain to this weird column-by-column situation for just the first 2 lines of the file.


As an additional unrelated task, I'd also like to get rid of the empty columns in the csv and fix the spelling error so that it looks like this:

Product Name,City Location,Price Per Unit
banana,CA,5.7
apple,FL,2.3

(The csv currently has a tab surrounded by quotes between every actual column of data except for the first column, which is just empty followed by a comma).

I will be receiving the csv with the spelling error multiple times, so I would like to programmatically fix the error in the script. Please also note that the columns may not always be in the order shown above, so I need to dynamically check each column name for the error during the script.

Best Answer

Try this

$ awk -F, 'NR<2{split(gensub(/Citty/,"City","g",$0),a,FS)}NR==2{for(b=2;b<=NF;b+=2){c=c a[b]" "$b","}print gensub(/,$/,"",1,c)}NR>2{print gensub(/(^,|" *",)/,"","g",$0)}' inp
Product Name,City Location,Price Per Unit
banana,CA,5.7
apple,FL,2.3
$

Same code is more readable if split across a few lines :

$ awk -F, '
> NR<2{split(gensub(/Citty/,"City","g",$0),a,FS)}
> NR==2{for(b=2;b<=NF;b+=2){c=c a[b]" "$b","}print gensub(/,$/,"",1,c)}
> NR>2{print gensub(/(^,|" *",)/,"","g",$0)}' inp
Product Name,City Location,Price Per Unit
banana,CA,5.7
apple,FL,2.3
$

If 1st line, split the line into array elements within a. Fix the Citty->City typo.

If 2nd line, starting with the 2nd column, print the corresponding column from 1st line together with this column. Repeat for each column, going in 2 column increments. Strip the trailing ,.

After 2nd line, replace any leading , or any "<spaces>", with an empty string and then print the result.

Tested ok on GNU Awk 4.0.2

Try it online!

Related Question