If the quotes are balanced, you will want to remove commas between every other quote, this can be expressed in awk
like this:
awk -F'"' -v OFS='' '{ for (i=2; i<=NF; i+=2) gsub(",", "", $i) } 1' infile
Output:
123,ABC DEV 23,345,534.202,NAME
Explanation
The -F"
makes awk separate the line at the double-quote signs, which means every other field will be the inter-quote text. The for-loop runs gsub
, short for globally substitute, on every other field, replacing comma (","
) with nothing (""
). The 1
at the end invokes the default code-block: { print $0 }
.
This (adapted from here) should do what you need though @rici's Perl one is much simpler:
$ sed -r ':a;s/(("[0-9,]*",?)*"[0-9,]*),/\1/;ta; s/""/","/g;
s/"([0-9]*)",?/\1,/g ' file
56,72,123454,x,y,"foo,a,b,bar"
56,92,1234,x,y,"foo,a,b,bar"
56,72,12345478765467,x,y,"foo,a,b,bar"
56,72,x,y,"foo,a,b,bar",123454,
56,72,x,y,"foo,a,b,bar",123454,45578492,"bar,foo"
Explanation
:a
: define a label called a
.
s/(("[0-9,]*",?)*"[0-9,]*),/\1/
: This one needs to be broken down
- First of all, using this construct :
(foo(bar))
, \1
will be foobar
and \2
will be bar
.
"[0-9,]*",?
: match 0 or more of 0-9
or ,
, followed by 0 or 1 ,
.
("[0-9,]*",?)*
: match 0 or more of the above.
"[0-9,]*
: match 0 or more of 0-9
or ,
that come right after a "
ta;
: go back to the label a
and run again if the substitution was successful.
s/""/","/g;
: post-processing. Replace ""
with ","
.
s/"([0-9]*)",?/\1,/g
: remove all quotes around numbers.
This might be easier to understand with another example:
$ echo '"1,2,3,4"' | sed -nr ':a;s/(("[0-9,]*",?)*"[0-9,]*),/\1/;p;ta;'
"1,2,34"
"1,234"
"1234"
"1234"
So, while you can find a number that is right after a quote and followed by a comma and another number, join the two numbers together and repeat the process until it is no longer possible.
At this point I believe it is useful to mention a quote from info sed
that appears in the section describing advanced functions such as the label used above (thanks for finding if @Braiam):
In most cases, use of these commands indicates that you are probably
better off programming in something like `awk' or Perl.
Best Answer
tail
+tr
:tail -n+2
prints the file starting from line two to the end.tr -d \"
deletes all double quotes.