I have a file with multiple sequences, the problem is that after the id there is a space and then the actual sequence, I want to add a break line between the id and the actual sequence.
This is what I have:
UniRef90_Q8YC41 Putative binding protein BMEII0691 MNRFIAFFRSVFLIGLVATAFGRACA
This is what I want it to look like:
UniRef90_Q8YC41 Putative binding protein BMEII0691
MNRFIAFFRSVFLIGLVATAFGRACA
If its possible I would rather it look like this
UniRef90_Q8YC41
MNRFIAFFRSVFLIGLVATAFGRACA
Best Answer
Using
awk
, printing first and last field with\n
as delimiter:Using
sed
, capturing first and last field while matching and using in replacement:With
perl
, similar logic tosed
:Using
bash
, slower approach, creating an array from each line and printing first and last element from the array separating them by\n
:With
python
, creating a list containing whitespace separated elements from each line, then printing the first and last element from the list, separating by\n
:Example: