Text Processing with Sed – Deleting Commented Lines in a Text File

sedtext processing

I am a sed novice. I would like to use sed to delete commented lines in a text file.

In my series of text files, commented lines begin with either the # symbol or the @ symbol (i.e., they are files originally intended for plotting in xmgrace). Suppose I have the following sample file (called sample.txt):

# Comment 1
# Another comment
# Yet another comment
@ More comments
@ Still more comments
data1
data2

data3

I would like to use sed to modify sample.txt, in place, to obtain the following:

data1
data2

data3

where there is a blank line between "data2" and "data3" in both versions of sample.txt.

With inspiration from this Stack Overflow question, I came up with this series of (consecutive) commands:

sed -i 's/#.*$//' sample.txt
sed -i 's/@.*$//' sample.txt

which give this resulting file:

    <blank line>
    <blank line>
    <blank line>
    <blank line>
    <blank line>
data1
data2

data3

where <blank line> is just a blank line. (I had to write it this way so that Stack Exchange would format the output correctly.)

The problem with the above is that the commented lines are only replaced with blank lines, not deleted altogether.

Another possibility of (consecutive) commands is:

sed -i 's/#.*$//' sample.txt
sed -i 's/@.*$//' sample.txt
sed -i '/^$/d' sample.txt

from which I obtain the following output:

data1
data2
data3

The above is, however, also wrong because the blank line between "data2" and "data3" has not been preserved (I am incorrectly deleting all blank lines, indiscriminately). How do I just delete the commented lines?

I am running Ubuntu Linux. Thanks for your time!

Best Answer

Use the sed delete command (here assuming GNU sed as found on Ubuntu and other GNU systems).

 sed -i '/^[@#]/ d' sample.txt

If you need to account for leading space characters:

sed -i '/^\s*[@#]/ d' sample.txt

Related Solutions

Sed command that would ignore any commented match

You should not believe them if they tell you it cannot be done. You should believe them, however, if they tell you it's not easy.

sed '\|*/|!{ s|/\*|\n&|              #if ! */ repl 1st /* w/ \n/*
     h;      s|foo|bar|g;/\n/!b      #hold; repl all foo/bar; if ! \n branch
     G;      s|\n.*\n||;:n           #Get; clear difference; :new label
     n;      \|*/|!bn;s|^|\n/*|      #new line; if ! */ branch new label
     };s|*/|\n&|g                    #repl all */ w/ \n*/
       s|foo|&\nbar|g;:r             #repl all foo w/ foo\nbar
       s|\(/\*[^\n]*\)\nbar|\1|g;tr  #repl all /*[^\n]*\nbar w/ foo
       s|foo\n\(b\)|\1|g             #repl all foo\nbar w/ bar
       s|^\n/.||;s|\n||g             #clear any \n inserts
'    <<\INPUT
asfoo   /* asdfooasdfoo


asdfasdfoo
asdfasdfoo
foo */foo /*foo*/ foo
/*.
foo*/
foo
hello

INPUT

OUTPUT

asbar   /* asdfooasdfoo


asdfasdfoo
asdfasdfoo
foo */bar /*foo*/ bar
/*.
foo*/
bar
hello

Does AWK have similar ability as SED to find line ranges based on text in line rather than line number

The OP's problem was caused by file file using CR (\r / ascii 13) instead of LF (\n / ascii 10) as line terminators as expected by sed. Using CR was the convention used in classic MacOS; as a non Mac user, the only use of it I've met with in the wild in the last two decades was in PDF files, where it greatly complicates any naive PDF parser written in perl (unlike RS in mawk and gawk, $/ in perl cannot be a regex).

As to the question from the title, yes, awk supports range patterns, and you can freely mix regexps and line number predicates (or any expression) in them. For example:

NR==1,/rex/   # all lines from the 1rst up to (and including)
          # the one matching /rex/

/rex/,0   # from the line matching /rex/ up to the end-of-file.

awk's ranges are different from sed's, because in awk the end predicate could also match the line which started the range. sed's behavior could be emulated with:

s=/start/, !s && /last/ { s = 0; print }

However, ranges in awk are still quite limited because they're not real expression (they cannot be negated, made part of other expressions, used in if(...), etc). Also, there is no magic: if you want to express something like a range with "context" (eg. /start/-4,/end/+4) you'll have to roll your own circular buffer and extra logic.

Best Answer

Related Solutions

Sed command that would ignore any commented match

OUTPUT

Does AWK have similar ability as SED to find line ranges based on text in line rather than line number

Related Question