Shell – Splitting a sequence into fixed width

awkshell-scripttext processing

I have a file like this which is a two-column tab-separated file.

CTGCAGTTTCCCCAAATGTGGGAAACTTGACTGTATAATTTGTGGCAGTGGTA   a1
GATTTCCCCAAATGTGGGAAACTCACTCGGCAGGCGTTGATA  a2

I want to get an output like this:

>a1
CTGCAGTTTCCCCAAATGTG
GGAAACTTGACTGTATAATT
TGTGGCAGTGGTA
>a2
GATTTCCCCAAATGTGGGAA
ACTCACTCGGCAGGCGTTGA
TA

I was trying to use the fold command inside awk. Is it possible to use another command within awk?

Also, the width of each line I want is 15, so I tried something like this, but it didn't work:

awk -F "\t" '{a=$(fold -w 50 $1);print a,$2}' file.txt 

How can I do this?

Best Answer

With python test.py < input and test.py:

import sys
for i in sys.stdin:
     s, ident = i.rstrip().split()
     print '>{0}'.format(ident)
     while s:
          print s[:15]
          s = s[15:]
Related Question