sed:
sed '2,$s/^\("[^"]*","\)/\1'"'"/ test.in
Using EREs to get rid of some of the escaping:
sed -E '2,$s/^("[^"]*",")/\1'"'"/ test.in
awk:
awk -F, 'NR>1{sub(/^"/,"\"'"'"'",$2)}1' test.in
If you don't want to worry about the quoting, use the escape code:
awk -F, '{sub(/^"/,"\"\x27",$2)}1' test.in
Try this
$ awk -F, 'NR<2{split(gensub(/Citty/,"City","g",$0),a,FS)}NR==2{for(b=2;b<=NF;b+=2){c=c a[b]" "$b","}print gensub(/,$/,"",1,c)}NR>2{print gensub(/(^,|" *",)/,"","g",$0)}' inp
Product Name,City Location,Price Per Unit
banana,CA,5.7
apple,FL,2.3
$
Same code is more readable if split across a few lines :
$ awk -F, '
> NR<2{split(gensub(/Citty/,"City","g",$0),a,FS)}
> NR==2{for(b=2;b<=NF;b+=2){c=c a[b]" "$b","}print gensub(/,$/,"",1,c)}
> NR>2{print gensub(/(^,|" *",)/,"","g",$0)}' inp
Product Name,City Location,Price Per Unit
banana,CA,5.7
apple,FL,2.3
$
If 1st line, split the line into array elements within a. Fix the Citty->City typo.
If 2nd line, starting with the 2nd column, print the corresponding column from 1st line together with this column. Repeat for each column, going in 2 column increments. Strip the trailing ,
.
After 2nd line, replace any leading ,
or any "<spaces>",
with an empty string and then print the result.
Tested ok on GNU Awk 4.0.2
Try it online!
Best Answer
There's no real issue with your file. It has embedded newlines and double quotes. A CSV parser would be able to handle it properly. Escaping double quotes with
"
(while double quoting the field) is the proper way to escape embedded double quotes in a CSV file.To replace the embedded newlines in your CSV file with a
@
character, you could do this:This uses
csvformat
from the csvkit toolbox. It's a proper CSV parser that is able to reformat CSV files.The command pipeline above first replaces all newlines that are not embedded with the
@
character. Then I usetr
to swap the remaining newlines and the@
characters with each other, ending up with a CSV file whose embedded newlines are@
.This relies on the fact that the original data in the file contains no
@
characters.If you then want to have spaces instead of a marker of where the newlines originally were, then use
tr '\n@' ' \n'
instead of thetr
shown above:Note that this would make it extremely difficult to re-insert the original newlines if there are other spaces in the data (as there is in the third field on the first line).
Would you prefer that
csvformat
did not remove all unnecessary double quotes, then use it with-U 1
: