Ubuntu – How to add a break line after the header of a sequence and before the actual sequence

command linetext processing

I have a file with multiple sequences, the problem is that after the id there is a space and then the actual sequence, I want to add a break line between the id and the actual sequence.

This is what I have:

UniRef90_Q8YC41 Putative binding protein BMEII0691 MNRFIAFFRSVFLIGLVATAFGRACA

This is what I want it to look like:

UniRef90_Q8YC41 Putative binding protein BMEII0691
MNRFIAFFRSVFLIGLVATAFGRACA

If its possible I would rather it look like this

UniRef90_Q8YC41
MNRFIAFFRSVFLIGLVATAFGRACA

Best Answer

  • Using awk, printing first and last field with \n as delimiter:

    awk '{printf "%s\n%s\n", $1, $NF}' file.txt
    
  • Using sed, capturing first and last field while matching and using in replacement:

    sed -E 's/([^[:blank:]]+).*[[:blank:]]([^[:blank:]]+)$/\1\n\2/' file.txt
    
  • With perl, similar logic to sed:

    perl -pe 's/^([^\s]+).*\s([^\s]+)/$1\n$2/' file.txt
    
  • Using bash, slower approach, creating an array from each line and printing first and last element from the array separating them by \n:

    while read -ra line; do printf '%s\n%s\n' "${line[0]}" \
           "${line[$((${#line[@]]}-1))]}"; done <file.txt
    
  • With python, creating a list containing whitespace separated elements from each line, then printing the first and last element from the list, separating by \n:

    #!/usr/bin/env python3
    with open("file.txt") as f:
        for line in f:
            line = line.split()
            print(line[0]+'\n'+line[-1])
    

Example:

$ cat file.txt                               
UniRef90_Q8YC41 Putative binding protein BMEII0691 MNRFIAFFRSVFLIGLVATAFGRACA
UniRef90_Q8YC41 Putative binding protein BMEII0691 MNRFIAFFRSVFLIGLVATAFGRACA

$ awk '{printf "%s\n%s\n", $1, $NF}' file.txt                             
UniRef90_Q8YC41
MNRFIAFFRSVFLIGLVATAFGRACA
UniRef90_Q8YC41
MNRFIAFFRSVFLIGLVATAFGRACA

$ sed -E 's/([^[:blank:]]+).*[[:blank:]]([^[:blank:]]+)$/\1\n\2/' file.txt
UniRef90_Q8YC41
MNRFIAFFRSVFLIGLVATAFGRACA
UniRef90_Q8YC41
MNRFIAFFRSVFLIGLVATAFGRACA

$ perl -pe 's/^([^\s]+).*\s([^\s]+)/$1\n$2/' file.txt
UniRef90_Q8YC41
MNRFIAFFRSVFLIGLVATAFGRACA
UniRef90_Q8YC41
MNRFIAFFRSVFLIGLVATAFGRACA


$ while read -ra line; do printf '%s\n%s\n' "${line[0]}" "${line[$((${#line[@]]}-1))]}"; done <file.txt
UniRef90_Q8YC41
MNRFIAFFRSVFLIGLVATAFGRACA
UniRef90_Q8YC41
MNRFIAFFRSVFLIGLVATAFGRACA

>>> with open("file.txt") as f:
...     for line in f:
...         line = line.split()
...         print(line[0]+'\n'+line[-1])
... 
UniRef90_Q8YC41
MNRFIAFFRSVFLIGLVATAFGRACA
UniRef90_Q8YC41
MNRFIAFFRSVFLIGLVATAFGRACA
Related Question