First two fields to be separated by _ and rest of the fields as they are

awkbioinformaticssedtext processing

#CHROM  POS     REF     ALT     ../S101_sorted.bam      ../S102_sorted.bam          ../S105_sorted.bam      ../S107_sorted.bam      ../S113_sorted.bam      ../S114_sorted.bam      ../S115_sorted.bam      ../S
Aradu.A01       296611  T       C       T       T       T       T       T       T       T       T       T       T       T       T       T       T       T/C     T       T/C     T       T       T       T
Aradu.A01       326689  T       C       T/C     T       T       T       T/C     T       T       T       T/C     T/C     T       T       T       T       T       T       T       T/C     T/C     T       T
Aradu.A01       615910  T       G       T       T       T       T       T       T       T       T       T       T       T       T       T       T       T       T       T       T       T       T       T
Aradu.A01       661394  T       A       T       T       T       T       T       T/A     T       T       T       T       T       T       T       T       T       T       T       T       T       T       T
Aradu.A01       941674  C       T       C       C/T     C       C       C/T     C       C       C       C       C       C       C       C       C       C       C       C       C       C       C       C
Aradu.A01       942064  C       T       C/T     C/T     C/T     C/T     C/T     C       C       C/T     C       C/T     C/T     C       C       C/T     C/T     C       C       C       C       C/T     C/T
Aradu.A01       954858  G       A       G/A     G       G       G       G       G       G       G       G       G       G       G       G       G       G       G       G/A     G       G       G       G
Aradu.A01       1196780 C       A       C/A     C       C       C       C       C       C       C       C       C       C       C/A     C       C       C/A     C       C       C       C       C       C

I have a file in the above format and I am trying to print the first two columns separated by _ and rest of the columns as they are. I tried the following awk script nut it does not return any output.

awk '{if (NR>1) print $1"_"$2; for(i=3;i<NF;i++) printf "\t", $i}' input_file > out_file.

Can any one please suggest what am I doing wrong here?

Best Answer

To change the whitespace between the first two columns to an underscore, I suggest sed:

 sed -e 's/[\t ]\+/_/'

And if you were to need to ignore the header line:

sed -e '/^#/! s/[\t ]\+/_/'

or, for the more general case (header might start with any char; \t works only with gnu sed)

sed -E '1! s/[[:blank:]]+/_/'

As to the question about your awk code, the first print, should likely be a printf so as not to have it print an ill timed newline.

Related Question