I would like to print pattern of Cys residue from each line given in file.tsv. file.tsv has two coloumns as sequenceID and Sequence. from the second column sequence first character "C" should be printed as C, if the next immediate residue is not C then the code should print C#. # should occur only one time for n number of various amino acid occurrence.
So when in Column if "C" is followed by another character I would like to print # after "C". so if sequence column has value DCFRCGHCC then it should print in the third column C#C#CC.
Example input:
c32_g1_i1_ 3GQKAKLKVPVFFLHRRGSICSSFYLMFSFEIKKK*TSKN*CFVCVRVRNRERAGVKCAHVYCPMFNGTQTH*IIISSLNS
c32_g1_i1_ 6AV*TADDDLVRLCSIEHGTIHMCTLYTCCTLTVTHTYTHKTLIFACLFFFNFKGEHQIERAANRTSSM*KKHRNF*LGLLAX
The output should be three columns: sequenceID, Sequence, Cys pattern
c32_g1_i1_3,GQKAKLKVPVFFLHRRGSICSSFYLMFSFEIKKK*TSKN*CFVCVRVRNRERAGVKCAHVYCPMFNGTQTH*IIISSLNS,C#C#C#C#C
c32_g1_i1_6,AV*TADDDLVRLCSIEHGTIHMCTLYTCCTLTVTHTYTHKTLIFACLFFFNFKGEHQIERAANRTSSM*KKHRNF*LGLLAX,C#C#CC#C
Best Answer
The first one-liner / full script parse and convert the file format described in the question; the second full script parses and converts a FASTA file format.
#1
Golfed one-liner:
Expanded full script:
Explanation:
Sample output:
#2
Expanded full version:
Explanation:
>
character, a space is appended to the line; if a following line exists and doesn't start with a>
character, the newline character is stripped from the current line; the current line is printed to a temporary file;Sample output: