The point of using multiple exclamation marks in sed

posixsed

POSIX sed documentation said:

A function can be preceded by one or more '!' characters, in which
case the function shall be applied if the addresses do not select the
pattern space. Zero or more <blank> characters shall be accepted
before the first '!' character. It is unspecified whether <blank>
characters can follow a '!' character, and conforming applications
shall not follow a '!' character with <blank> characters.

So, with any POSIX sed, we can:

sed -e '/pattern/!d' file

It's the same as writing:

sed -e '/pattern/!!d' file

And !!!d and n of exclamation marks are still be fine (Tested with three sed version from heirloom toolchest). I don't see any benefit between multiple instead of one exclamation.

Why did the spec allow that syntax and how is it useful in real world application?


It seems that GNU sed is not compliant in this case, it will complain if we use multiple exclamations:

$ sed -e '/pattern/!!d' file
sed: -e expression #1, char 11: multiple `!'s

Best Answer

sed's API is primitive - and this is by design. At least, it has remained primitive by design - whether it was designed primitively at inception I cannot say. In most cases the writing of a sed script which, when run, will output another sed script is a simple matter indeed. sed is very often applied in this way by macro preprocessors such as m4 and/or make.

(What follows is a highly hypothetical use case: it is a problem engineered to suit a solution. If it feels like a stretch to you, then that is probably because it is, but that doesn't necessarily make it any less valid.)


Consider the following input file:

cat <<"" >./infile
camel
cat dog camel
dog cat
switch
upper
lower

If we wanted to write a sed script which would append the word -case to the tail of each appropriate word in the above input file only if it could be found on a line in appropriate context, and we desired to do so as efficiently as possible (as should be our goal, for example, during a compile operation) then we should prefer to avoid applying /regexp/s as much as possible.

One thing we might do is pre-edit the file on our system right now, and never call sed at all during compilation. But if any of those words in the file should or should not be included based on local settings and/or compile-time options, then doing so would likely not be a desirable alternative.

Another thing we might do is process the file now against regexps. We can produce - and include in our compilation - a sed script which can apply edits according to line number - which is typically a far more efficient route in the long-run.

For example:

n=$(printf '\\\n\t')
grep -En 'camel|upper|lower' <infile |
sed "   1i${n%?}#!/usr/heirloom/bin/posix2001/sed -nf
        s/[^:]*/:&$n&!n;&!b&$n&/;s/://2;\$a${n%?}q"'
        s/ *cat/!/g;s/ *dog/!/g
        s| *\([cul][^ ]*\).*|s/.*/\1-case/p|'

...which writes output in the form of a sed script and which looks like...

#!/usr/heirloom/bin/posix2001/sed -nf
:1
    1!n;1!b1
    1s/.*/camel-case/p
:2
    2!n;2!b2
    2!!s/.*/camel-case/p
:5
    5!n;5!b5
    5s/.*/upper-case/p
:6
    6!n;6!b6
    6s/.*/lower-case/p
q

When that output is saved to an executable text file on my machine named ./bang.sed and run like ./bang.sed ./infile, the output is:

camel-case
upper-case
lower-case

Now you might ask me... Why would I want to do that? Why would I not just anchor grep's matches? Who uses camel-case anyway? And to each question I could only reply, I have no idea... because I don't. Before reading this question I had never personally noticed the multi-! parsing requirement in the spec - I think it's a pretty neat catch.

The multi-! thing did immediately make sense to me, though - much of the sed specification is geared toward simply parsed and simply generated sed scripts. You'll probably find the required \newline delimiters for [wr:bt{] make a lot more sense in that context, and if you keep that idea in mind you might make better sense of some other aspects of the spec - (such as : accepting no addresses, and q refusing to accept any more than 1).

In the example above I write out a certain form of sed script which can only ever be read once. If you look hard at it you might notice that as sed reads the edit-file it progresses from one command-block to the next - it never branches away from or completes its edit-script until it is completely through with its edit-file.

I consider that multi-! addresses might be more useful in that context than in some others, but, in honesty, I can't think of a single case in which I might have put it to very good use - and I sed a lot. I also think it noteworthy that GNU/BSD seds both fail to handle it as specified - this is probably not an aspect of the spec which is in much demand, and so if an implementation overlooks it I doubt very seriously their bugs@ box will suffer terribly as a result.

That said, failure to handle this as specified is a bug for any implementation which pretends to compliance, and so I think shooting an email to the relevant dev boxes is called-for here, and I intend to do so if you don't.

Related Question