Using sed, I'd like to substitute every comma that is outside of double quotes for a pipe.
So that this line in .csv file:
John,Tonny,"345.3435,23",56th Street
Would be converted to:
John|Tonny|"345.3435,23"|56th Street
Could you help me with the regex for that?
Best Answer
If your
sed
supports the-E
option (-r
in some implementations):The
Is a very common
sed
idiom. It keeps doing the same substitution in a loop as long as it's successful.Here, we're substituting the leading part of the line made of 0 or more quoted strings or characters other that
"
and,
(captured in\1
) followed by a,
with that\1
capture and a|
, so on your sample that means:John,Tonny,"345.3435,23",56th Street
->John|Tonny,"345.3435,23",56th Street
John|Tonny,"345.3435,23",56th Street
->John|Tonny|"345.3435,23",56th Street
John|Tonny|"345.3435,23",56th Street
->John|Tonny|"345.3435,23"|56th Street
With
perl
, you could do it with one substitution with theg
flag with:Here, assuming quotes are balanced in the input, the pattern would match all the input, breaking it up in either:
,
or"
And only when the matched string is a comma (when
$1
is not defined in the replacement part), replace it with a|
.