I have a file that looks something like this:
ID101 G T freq=.5 nonetype ANC=.1 addinfor
ID102 A T freq=.3 ANC=.01 addinfor
ID102 A T freq=.01 type=1 ALT=0.022 ANC=.02 addinfor
As you can see, each line has a slightly different number of columns. I specifically want column 1, column 2, column 3, column 4 and the column that starts with ANC=
Desired output:
ID101 G T freq=.5 ANC=.1
ID102 A T freq=.3 ANC=.01
ID102 A T freq=.01 ANC=.02
I generally use the an awk command to parse files:
awk 'BEGIN {OFS = "\t"} {print $1, $2, $3, $4}'
Is there an easy way to alter this command to work for situations like this?
I think something like this might work:
awk '{for(j=1;j<=NF;j++){if($j~/^ANC=/){print $j}}}'
However, how can I edit this to also print the first columns?
Best Answer
With
awk
:for(...)
loops through all fields, starting with field 5 (i=5
).if($i~/^ANC=/)
checks if the field starts withANC=
a=$i
if yes, set variable a to that valueprint $1,$2,$3,$4,a
print fields 1-4 followed by whatever is stored ina
.Can be combined with
BEGIN {OFS="\t"}
of course.