I have a text file, and I want extract the string from each line coming after "OS="
input file line
A0A0A9PBI3_ARUDO Uncharacterized protein OS=Arundo donax OX=35708 PE=4 SV=1
K3Y356_SETIT ATP-dependent DNA helicase OS=Setaria italica OX=4555 PE=3 SV=1
Output desired
OS=Arundo donax
OS=Setaria italica
OR
Arundo donax
Setaria italica
Best Answer
Use GNU
grep
(or compatible) with extended regex:or basic regex (you need to escape
+
To get everything from
OS=
up toOX=
you can usegrep
with perl-compatible regex (PCRE) (-P
option) if available and make lookahead:or use
grep
includingOX=
and remove it withsed
afterwards:Output: