Text Processing – How to Remove Commas Embedded Within Quotes in a CSV File

awkcsvsedtext processing

I have a input file delimited with commas (,). There are some fields enclosed in double quotes that are having a comma in them. Here is the sample row

123,"ABC, DEV 23",345,534.202,NAME

I need to remove all the comma's occuring within inside the double quotes and the double quotes as well. So the above line should get parsed into as shown below

123,ABC DEV 23,345,534.202,NAME

I tried the following using sed but not giving expected results.

sed -e 's/\(".*\),\(".*\)/\1 \2/g'

Any quick tricks with sed, awk or any other unix utility please?

Best Answer

If the quotes are balanced, you will want to remove commas between every other quote, this can be expressed in awk like this:

awk -F'"' -v OFS='' '{ for (i=2; i<=NF; i+=2) gsub(",", "", $i) } 1' infile

Output:

123,ABC DEV 23,345,534.202,NAME

Explanation

The -F" makes awk separate the line at the double-quote signs, which means every other field will be the inter-quote text. The for-loop runs gsub, short for globally substitute, on every other field, replacing comma (",") with nothing (""). The 1 at the end invokes the default code-block: { print $0 }.

Related Question