Shell – Replace the comma with vertical bar |, except when inside double quotes, and remove double quotes

csvsedshellshell-script

File1

12584,"Capital of America, Inc.",,HORIZONCAPITAL,USA,......etc
25841,"Capital of America, Inc.",,HORIZONCAPITAL,USA,......etc
87455,"Capital of America, Inc.",,HORIZONCAPITAL,USA,......etc

Output

12584|Capital of America, Inc.||HORIZONCAPITAL|USA|......etc
25841|Capital of America, Inc.||HORIZONCAPITAL|USA|......etc
87455|Capital of America, Inc.||HORIZONCAPITAL|USA|......etc

I have a csv file, which i have to conver into a text file delimited with pipe(|)
I have done the shell script sed 's/^/"/;s/,/|/g;s/$/"/' $File > $Output

But the problem is the field "Capital of America, Inc." contains a comma, that's also replaced by the pipe (|).
So I just wanted to replace all, with pipe except not inside the value is given double quotes " ".

Is there any shell script to do this?

Best Answer

Using csvkit:

$ csvformat -D '|' file.csv
12584|Capital of America, Inc.||HORIZONCAPITAL|USA|......etc
25841|Capital of America, Inc.||HORIZONCAPITAL|USA|......etc
87455|Capital of America, Inc.||HORIZONCAPITAL|USA|......etc

csvkit is a collection of CSV manipulation/querying tools written in Python. These do proper CSV parsing and csvformat may be used to replace the default comma delimiter with any other character. The utility will make sure that the result is properly quoted according to CSV rules.

Related Question