Unix awk begin statement

awk

I was trying to use a begin statement in awk, but somehow the begin statement is printed on top of every record instead of the very first record that I wanted. I am not very sure why, it would be grateful if someone can give me some opinions as to what goes wrong in my code.

awk 'BEGIN { OFS="\t" }{print "MARKER\tCHR\tBP\tEA\tNEA\tEAF\tP\tOR\tSE\tOR_95L\tOR95U\tN\tN_CASES\tN_CONTROLS\tSTRAND\tINFO\tHWE_P\tIMPUTED"}FNR>16 && $45!="NA" && $9>=0.4 { if ($1=="---") print $2,"'"$chr"'",$4,$6,$5,$45,$42,$48,$43,$44,$18,$23,$28,"+",$9,$33,"0" ; else print $2,"'"$chr"'",$4,$6,$5,$45,$42,$48,$43,$44,$18,$23,$28,"+",$9,$33,"1" }' ./out/expected_dcct_1kg_only${chr}_${chunk}.res > ./temp/expected_dcct_1kg_chr${chr}_${chunk}.tmp

I was hoping to see the line:

Marker Chr BP …. 1

on the first line only, but somehow it seems to be printed for every record.

This is a snapshot of the output:

MARKER  CHR BP  EA  NEA EAF P   OR  SE  OR_95L  OR95U   N   N_CASES N_CONTROLS  STRAND  INFO    HWE_P   IMPUTED
MARKER  CHR BP  EA  NEA EAF P   OR  SE  OR_95L  OR95U   N   N_CASES N_CONTROLS  STRAND  INFO    HWE_P   IMPUTED
MARKER  CHR BP  EA  NEA EAF P   OR  SE  OR_95L  OR95U   N   N_CASES N_CONTROLS  STRAND  INFO    HWE_P   IMPUTED
MARKER  CHR BP  EA  NEA EAF P   OR  SE  OR_95L  OR95U   N   N_CASES N_CONTROLS  STRAND  INFO    HWE_P   IMPUTED
MARKER  CHR BP  EA  NEA EAF P   OR  SE  OR_95L  OR95U   N   N_CASES N_CONTROLS  STRAND  INFO    HWE_P   IMPUTED
MARKER  CHR BP  EA  NEA EAF P   OR  SE  OR_95L  OR95U   N   N_CASES N_CONTROLS  STRAND  INFO    HWE_P   IMPUTED
MARKER  CHR BP  EA  NEA EAF P   OR  SE  OR_95L  OR95U   N   N_CASES N_CONTROLS  STRAND  INFO    HWE_P   IMPUTED
MARKER  CHR BP  EA  NEA EAF P   OR  SE  OR_95L  OR95U   N   N_CASES N_CONTROLS  STRAND  INFO    HWE_P   IMPUTED
MARKER  CHR BP  EA  NEA EAF P   OR  SE  OR_95L  OR95U   N   N_CASES N_CONTROLS  STRAND  INFO    HWE_P   IMPUTED
MARKER  CHR BP  EA  NEA EAF P   OR  SE  OR_95L  OR95U   N   N_CASES N_CONTROLS  STRAND  INFO    HWE_P   IMPUTED
MARKER  CHR BP  EA  NEA EAF P   OR  SE  OR_95L  OR95U   N   N_CASES N_CONTROLS  STRAND  INFO    HWE_P   IMPUTED
MARKER  CHR BP  EA  NEA EAF P   OR  SE  OR_95L  OR95U   N   N_CASES N_CONTROLS  STRAND  INFO    HWE_P   IMPUTED
MARKER  CHR BP  EA  NEA EAF P   OR  SE  OR_95L  OR95U   N   N_CASES N_CONTROLS  STRAND  INFO    HWE_P   IMPUTED
MARKER  CHR BP  EA  NEA EAF P   OR  SE  OR_95L  OR95U   N   N_CASES N_CONTROLS  STRAND  INFO    HWE_P   IMPUTED
MARKER  CHR BP  EA  NEA EAF P   OR  SE  OR_95L  OR95U   N   N_CASES N_CONTROLS  STRAND  INFO    HWE_P   IMPUTED
MARKER  CHR BP  EA  NEA EAF P   OR  SE  OR_95L  OR95U   N   N_CASES N_CONTROLS  STRAND  INFO    HWE_P   IMPUTED
MARKER  CHR BP  EA  NEA EAF P   OR  SE  OR_95L  OR95U   N   N_CASES N_CONTROLS  STRAND  INFO    HWE_P   IMPUTED
rs7002152   8   145000056   C   T   0.937422    0.984021    0.165311    0.71094 1.362   1304    79  1225    +   0.981763    0.309615    0
MARKER  CHR BP  EA  NEA EAF P   OR  SE  OR_95L  OR95U   N   N_CASES N_CONTROLS  STRAND  INFO    HWE_P   IMPUTED

Best Answer

That's because you did not put your print in the BEGIN. All you have in the BEGIN block is OFS="\t". Which, by the way, means you don't need to add "\t" to your print call. So, what you're after is (I changed the formatting a bit for clarity):

awk 'BEGIN { OFS="\t"; 
            print "MARKER", "CHR", "BP", "EA", "NEA","EAF", "P","OR","SE", 
             "OR_95L","OR95U", "N","N_CASES", "N_CONTROLS", "STRAND","INFO", 
             "HWE_P","IMPUTED"
        }
FNR>16 && $45!="NA" && $9>=0.4 { 
    if ($1=="---") print $2,"'"$chr"'",$4,$6, $5,$45,$42,$48, 
                     $43,$44,$18,$23, $28,"+",$9,$33,"0" ; 
    else print $2,"'"$chr"'",$4,$6,$5,$45, $42,$48,$43,$44,$18, 
               $23,$28,"+",$9,$33,"1" 
}' ./out/expected_dcct_1kg_only${chr}_${chunk}.res > \
  ./temp/expected_dcct_1kg_chr${chr}_${chunk}.tmp
Related Question