Awk when both delimiter and quotes are used for a field

awkcsv

I have a file in the following format:

field1|field2|field3
field1|"field2|field2"|field3

Notice the second row contains double quotes. The string within the double quotes belongs to field 2. How do extract this using awk? I've been googling with no results. I tried this with no luck as well

FS='"| "|^"|"$' '{print $2}'  

Best Answer

If you have a recent version of gawk you're in luck. There's the FPAT feature, documented here

awk 'BEGIN {
 FPAT = "([^|]+)|(\"[^\"]+\")"
}
{
 print "NF = ", NF
 for (i = 1; i <= NF; i++) {
    sub(/"$/, "", $i); sub(/^"/, "", $i);printf("$%d = %s\n", i, $i)
 }
}' file

NF =  3
$1 = field1
$2 = field2
$3 = field3
NF =  3
$1 = field1
$2 = field2|field2
$3 = field3
Related Question