Sed Commands – Why Grouping After an Address in a Block Fails

sed

I’m trying to use sed to print all lines until but excluding a specific pattern. I don’t understand why the following doesn’t work:

sed '/PATTERN/{d;q}' file

According to my understanding of sed scripting, this expression should cause the following:

  • When a line matches /PATTERN/, execute the group consisting of commands to
    1. delete the pattern space (= the current line)
    2. quit after printing the current pattern space

In isolation, both /PATTERN/d and /PATTERN/q work; that is, d deletes the offending line, and q causes sed to terminate but after printing the line, as documented. But grouping the two operations together in a block seemingly causes the q to be ignored.

I know that I can use Q instead of {d;q} as a GNU extension (and this works as expected!) but I’m interested in understanding why the above doesn’t work, and in what way I am misinterpreting the documentation.


My actual use-case is (only slightly) more complex, since the first line of the file actually matches the pattern, and I’m skipping that (after doing some replacement):

sed -e '1{s/>21/>chr21/; n}' -e '/>/{d;q}' in.fasta >out.fasta

But the above, simplified case exhibits the same behaviour.

Best Answer

To output all lines of a file until the matching of a particular pattern (and to not output that matching line), you may use

sed -n '/PATTERN/q; p;' file

Here, the default output of the pattern space at the end of each cycle is disabled with -n. Instead we explicitly output each line with p. If the given pattern matches, we halt processing with q.

Your actual, longer, command, which changes the name of chromosome 21 from just 21 to chr21 on the first line of a fasta file, and then proceeds to extract the DNA for that chromosome until it hits the next fasta header line, may be written as

sed -n -e '1 { s/^>21/>chr21/p; d; }' \
       -e '/^>/q' \
       -e p <in.fasta >out.fasta

or

sed -n '1 { s/^>21/>chr21/p; d; }; /^>/q; p' <in.fasta >out.fasta

The issue with your original expression is that the d starts a new cycle (i.e., it forces the next line to be read into the pattern space and there's a jump to the start of the script). This means q would never be executed.

Note that to be syntactically correct on non-GNU systems, your original script should look like /PATTERN/ { d; q; }. Note the added ; after q (the spaces are not significant).

Related Question