Ubuntu – Using sed to remove all lines containing special characters , numbers, and spaces

bashcommand linescriptssed

So I'm fairly new to using shell–bash specifically–and I'm in the process of writing a script which will translate files containing DNA sequences into a more useful format. Unfortunately, many of these files will contain extraneous lines used for labeling information, etc. I need a sed command that will exclude these lines containing special character, numbers, or spaces. I've found that it is fairly straightforwards removing lines with spaces by using

sed '/ /d' infile

and I imagine that removing lines containing numbers will be a similar strategy using regex. I just haven't really found any way of approaching special characters in sed.

Thanks

Best Answer

To delete any line that is not composed entirely of alphabetic characters, you'd need to add start (^) and end ($) anchors

sed '/^[[:alpha:]]*$/!d' file

Instead, you could delete any line that contains at least one non-alphabetic character

sed '/[^[:alpha:]]/d' file

Note that the caret ^ is acting as a negation operator here rather than as an anchor as in the previous expression.


Alternatively, using grep's whole-line (-x or --line-regexp) option

grep -x '[[:alpha:]]*' file

(equivalent of the first sed expression) or using an inverse match (-v)

grep -v '[^[:alpha:]]' file

(equivalent of the second sed expression).