How to use ^#$ as record separator in awk

awk

How do you tell awk to use a # character by itself in a line as record
separator? The problem is you can't say RS="^#$" because ^ matches the
beginning of the file, not the beginning of a line, and RS="#\n" doesn't work either because it matches # characters that aren't at the beginning of a line.

$ data='#
first record, first field
first record, second field
#
second record, first field#
second record, second field
'

Then print the first field of each record, using RS="#\n":

$ printf "%s" "$data" | awk '
  BEGIN { RS="#\n"; FS="\n" }
  /./ {print $1}
  '
first record, first field
second record, first field
second record, second field

The last line is wrong because it's not the first field but the second. The
intended output was

first record, first field
second record, first field#

Best Answer

Here's one way of doing it in awk:

$ printf "%s\n" "$data" | 
    awk -F'\n' -v RS='(^|\n)#\n' '/./ {print $1}' 
first record, first field
second record, first field#

The trick is to set the record separator to either the beginning of the file (^), or a newline, followed by a # and another newline \n.

Related Solutions

BEGIN and END with the awk command

The BEGIN isn't superfluous. If you don't specify BEGIN then the print would be executed for every line of input.

Quoting from the manual:

A BEGIN rule is executed once only, before the first input record is read. Likewise, an END rule is executed once only, after all the input is read.

$ seq 5 | awk 'BEGIN{print "Hello"}/4/{print}'   # Hello printed once
Hello
4
$ seq 5 | awk '{print "Hello"}/4/{print}'        # Hello printed for each line of input
Hello
Hello
Hello
Hello
4
Hello
$

Unix awk begin statement

That's because you did not put your print in the BEGIN. All you have in the BEGIN block is OFS="\t". Which, by the way, means you don't need to add "\t" to your print call. So, what you're after is (I changed the formatting a bit for clarity):

awk 'BEGIN { OFS="\t"; 
            print "MARKER", "CHR", "BP", "EA", "NEA","EAF", "P","OR","SE", 
             "OR_95L","OR95U", "N","N_CASES", "N_CONTROLS", "STRAND","INFO", 
             "HWE_P","IMPUTED"
        }
FNR>16 && $45!="NA" && $9>=0.4 { 
    if ($1=="---") print $2,"'"$chr"'",$4,$6, $5,$45,$42,$48, 
                     $43,$44,$18,$23, $28,"+",$9,$33,"0" ; 
    else print $2,"'"$chr"'",$4,$6,$5,$45, $42,$48,$43,$44,$18, 
               $23,$28,"+",$9,$33,"1" 
}' ./out/expected_dcct_1kg_only${chr}_${chunk}.res > \
  ./temp/expected_dcct_1kg_chr${chr}_${chunk}.tmp

Best Answer

Related Solutions

BEGIN and END with the awk command

Unix awk begin statement

Related Question