Ubuntu – How to add a break line after the header of a sequence and before the actual sequence

command linetext processing

I have a file with multiple sequences, the problem is that after the id there is a space and then the actual sequence, I want to add a break line between the id and the actual sequence.

This is what I have:

UniRef90_Q8YC41 Putative binding protein BMEII0691 MNRFIAFFRSVFLIGLVATAFGRACA

This is what I want it to look like:

UniRef90_Q8YC41 Putative binding protein BMEII0691
MNRFIAFFRSVFLIGLVATAFGRACA

If its possible I would rather it look like this

UniRef90_Q8YC41
MNRFIAFFRSVFLIGLVATAFGRACA

Best Answer

Using awk, printing first and last field with \n as delimiter:
```
awk '{printf "%s\n%s\n", $1, $NF}' file.txt
```
Using sed, capturing first and last field while matching and using in replacement:
```
sed -E 's/([^[:blank:]]+).*[[:blank:]]([^[:blank:]]+)$/\1\n\2/' file.txt
```

With perl, similar logic to sed:

perl -pe 's/^([^\s]+).*\s([^\s]+)/$1\n$2/' file.txt

Using bash, slower approach, creating an array from each line and printing first and last element from the array separating them by \n:
```
while read -ra line; do printf '%s\n%s\n' "${line[0]}" \
       "${line[$((${#line[@]]}-1))]}"; done <file.txt
```
With python, creating a list containing whitespace separated elements from each line, then printing the first and last element from the list, separating by \n:
```
#!/usr/bin/env python3
with open("file.txt") as f:
    for line in f:
        line = line.split()
        print(line[0]+'\n'+line[-1])
```

Example:

$ cat file.txt                               
UniRef90_Q8YC41 Putative binding protein BMEII0691 MNRFIAFFRSVFLIGLVATAFGRACA
UniRef90_Q8YC41 Putative binding protein BMEII0691 MNRFIAFFRSVFLIGLVATAFGRACA

$ awk '{printf "%s\n%s\n", $1, $NF}' file.txt                             
UniRef90_Q8YC41
MNRFIAFFRSVFLIGLVATAFGRACA
UniRef90_Q8YC41
MNRFIAFFRSVFLIGLVATAFGRACA

$ sed -E 's/([^[:blank:]]+).*[[:blank:]]([^[:blank:]]+)$/\1\n\2/' file.txt
UniRef90_Q8YC41
MNRFIAFFRSVFLIGLVATAFGRACA
UniRef90_Q8YC41
MNRFIAFFRSVFLIGLVATAFGRACA

$ perl -pe 's/^([^\s]+).*\s([^\s]+)/$1\n$2/' file.txt
UniRef90_Q8YC41
MNRFIAFFRSVFLIGLVATAFGRACA
UniRef90_Q8YC41
MNRFIAFFRSVFLIGLVATAFGRACA


$ while read -ra line; do printf '%s\n%s\n' "${line[0]}" "${line[$((${#line[@]]}-1))]}"; done <file.txt
UniRef90_Q8YC41
MNRFIAFFRSVFLIGLVATAFGRACA
UniRef90_Q8YC41
MNRFIAFFRSVFLIGLVATAFGRACA

>>> with open("file.txt") as f:
...     for line in f:
...         line = line.split()
...         print(line[0]+'\n'+line[-1])
... 
UniRef90_Q8YC41
MNRFIAFFRSVFLIGLVATAFGRACA
UniRef90_Q8YC41
MNRFIAFFRSVFLIGLVATAFGRACA

1. Using `sed` append

If you want a completely new line with the word /period after each line containing /dn then use sed append:

sed '\:\dn:a/period' filename

the output for your sample would be:

/dn
/period
/name

1. Notes

:\dn: search for \dn
a/period append /period to the next line (a new line).

2. Search and append to the end of next line

If you want the /period at the end of the next line then use it like this:

sed ':/dn: { N; s:$:/period: }' filename

Here is a sample input:

/dn
/name

and the output:

/dn
/name /period

2. Notes

First we are searching for lines with /dn, then we add the /period at the end of ($) next line (N;).

Best Answer

Related Solutions

Ubuntu – How to find a word and add text after it in a .txt file

1. Using sed append

1. Notes

2. Search and append to the end of next line

2. Notes

Related Question

1. Using `sed` append