Only simple cases can be expressed with &&
and ||
. The if
construct is more general. if CONDITION; then FOO; fi
is equivalent to CONDITION && FOO
(assuming proper usage of braces to delimit a block if necessary), but as soon as there's an else
(or elif
), this is no longer possible in general.
if CONDITION; then FOO; else BAR; fi
is not equivalent to
CONDITION && FOO || BAR
If CONDITION
is true, both constructs execute FOO
. If FOO
is false, then the if
construct skips BAR
, whereas the && … ||
construct executes BAR
.
And no, you can't work around this with CONDITION && { FOO; true; } || BAR
. This makes the compound command return true if CONDITION
is true and FOO
is false, whereas if CONDITION; then FOO; else BAR; fi
returns false in that case.
That's for the semantic difference. In addition, there's readability: nested uses of &&
and ||
very quickly become hard to decipher. I don't recommend using both in the same command, in fact — especially given that the two operators have equal precedence in the shell, whereas they have different precedences in C and most other C-inspired languages (including Perl and Ruby).
sed
's API is primitive - and this is by design. At least, it has remained primitive by design - whether it was designed primitively at inception I cannot say. In most cases the writing of a sed
script which, when run, will output another sed
script is a simple matter indeed. sed
is very often applied in this way by macro preprocessors such as m4
and/or make
.
(What follows is a highly hypothetical use case: it is a problem engineered to suit a solution. If it feels like a stretch to you, then that is probably because it is, but that doesn't necessarily make it any less valid.)
Consider the following input file:
cat <<"" >./infile
camel
cat dog camel
dog cat
switch
upper
lower
If we wanted to write a sed
script which would append the word -case to the tail of each appropriate word in the above input file only if it could be found on a line in appropriate context, and we desired to do so as efficiently as possible (as should be our goal, for example, during a compile operation) then we should prefer to avoid applying /
regexp/
s as much as possible.
One thing we might do is pre-edit the file on our system right now, and never call sed
at all during compilation. But if any of those words in the file should or should not be included based on local settings and/or compile-time options, then doing so would likely not be a desirable alternative.
Another thing we might do is process the file now against regexps. We can produce - and include in our compilation - a sed
script which can apply edits according to line number - which is typically a far more efficient route in the long-run.
For example:
n=$(printf '\\\n\t')
grep -En 'camel|upper|lower' <infile |
sed " 1i${n%?}#!/usr/heirloom/bin/posix2001/sed -nf
s/[^:]*/:&$n&!n;&!b&$n&/;s/://2;\$a${n%?}q"'
s/ *cat/!/g;s/ *dog/!/g
s| *\([cul][^ ]*\).*|s/.*/\1-case/p|'
...which writes output in the form of a sed
script and which looks like...
#!/usr/heirloom/bin/posix2001/sed -nf
:1
1!n;1!b1
1s/.*/camel-case/p
:2
2!n;2!b2
2!!s/.*/camel-case/p
:5
5!n;5!b5
5s/.*/upper-case/p
:6
6!n;6!b6
6s/.*/lower-case/p
q
When that output is saved to an executable text file on my machine named ./bang.sed
and run like ./bang.sed ./infile
, the output is:
camel-case
upper-case
lower-case
Now you might ask me... Why would I want to do that? Why would I not just anchor grep
's matches? Who uses camel-case anyway? And to each question I could only reply, I have no idea... because I don't. Before reading this question I had never personally noticed the multi-! parsing requirement in the spec - I think it's a pretty neat catch.
The multi-! thing did immediately make sense to me, though - much of the sed
specification is geared toward simply parsed and simply generated sed
scripts. You'll probably find the required \n
ewline delimiters for [wr:bt{]
make a lot more sense in that context, and if you keep that idea in mind you might make better sense of some other aspects of the spec - (such as :
accepting no addresses, and q
refusing to accept any more than 1).
In the example above I write out a certain form of sed
script which can only ever be read once. If you look hard at it you might notice that as sed
reads the edit-file it progresses from one command-block to the next - it never branches away from or completes its edit-script until it is completely through with its edit-file.
I consider that multi-! addresses might be more useful in that context than in some others, but, in honesty, I can't think of a single case in which I might have put it to very good use - and I sed
a lot. I also think it noteworthy that GNU/BSD sed
s both fail to handle it as specified - this is probably not an aspect of the spec which is in much demand, and so if an implementation overlooks it I doubt very seriously their bugs@ box will suffer terribly as a result.
That said, failure to handle this as specified is a bug for any implementation which pretends to compliance, and so I think shooting an email to the relevant dev boxes is called-for here, and I intend to do so if you don't.
Best Answer
So it's high-time this question had an answer, and, though I eventually intuitively worked out the how to do this correctly in pretty much every case some time ago, I only very recently managed to fairly concrete that understanding with the text in the standard. It's actually stated there fairly simply - I just stupidly overlooked it many times, I guess.
The relevant portions of the text are all found under the heading...
Editing Commands in
sed
:The argument text shall consist of one or more lines. Each embedded
\n
ewline in the text shall be preceded by a\
backslash. Other backslashes in text shall be removed, and the following character shall be treated literally.The
r
andw
command verbs, and thew
flag to thes
command, take an optional rfile (or wfile) parameter, separated from the command verb letter or flag by one or more<blank>s
; implementations may allow zero separation as an extension.Command verbs other than
{
,a
,b
,c
,i
,r
,t
,w
,:
, and#
can be followed by a;
semicolon, optional<blank>s
, and another command verb. However, when thes
command verb is used with thew
flag, following it with another command in this manner produces undefined results....in...
Options: Multiple
-e
and-f
options may be specified. All commands shall be added to the script in the order specified, regardless of their origin.-e
script - Add the editing commands specified by the script option-argument to the end of the script of editing commands. The script option-argument shall have the same properties as the script operand, described in the OPERANDS section.-f
script_file - Add the editing commands in the file script_file to the end of the script.And last in...
Operands:
\n
ewline.So, when you take it altogether, it makes sense that any command which is optionally followed by an arbitrary parameter without a predefined delimiter (as opposed to
s d sub d repl d flag
for example) should delimit at an unescaped\n
ewline.It is arguable that the
;
is a predefined delimiter but in that case using the;
for any of[aic]
commands would necessitate that a separate parser be included in the implementation specifically for those three commands - separate, that is, from the parser used for[:brw]
, for example. Or else the implementation would have to require that;
also be backslash escaped within the text parameter and it only grows more complicated from there on.If I were writing a
sed
which I desired to be both compliant and efficient, then I would not write such a separate parser, I expect - except that maybe[aic]
should gen a syntax error if not immediately followed by a\n
ewline. But that is a simple tokenization problem - the end delimiter case is generally the more problematic one. I would just write it so:...and...
...would behave very similarly, in that the first would create and write to a file named:
...and the second would append a block of text to the current line on output like...
...because both would share the same parsing code for the parameter.
And regarding the
{ ... }
and$!
issue - well, I was way off there. A single command preceded by an address is not a function but rather it is just an addressed command. Almost all commands - including{
function definition}
are specified to accept/one/
or/one/,/two/
addresses - with the exception of#
comment and:
label definition. And an address can be either a line number or a regular express and can be negated with!
. So all of......can be followed by a
;
and more commands according to standard, but if more commands are required for a single address, and that address should not be reevaluated following the execution of each command, then a{
function}
should be used like:...where
{
cannot be followed on the same line by a closing}
and that a closing}
cannot occur except at the start of a line. But if a contained command should not otherwise be followed by a\n
ewline, then it need not within the function either. So all of the aboves///
ubstitutions - and even the closing}
brace, can be portably followed by;
semicolons and further commands.I keep talking about
\n
ewline delimiters but the question is instead about-e
xpression statements, I know. But the two are really one and the same, and the key relation is that a script can be either a literal command-line argument or a file with either of-[ef]
, and that both are interpreted as text files (which are specified to end in a\n
ewline) but neither need actually end in a\n
ewline. By this I can reasonbly (I hope) infer that a\0NUL
delimited argument implies an ending\n
ewline, and as all invocation arguments get at least) a\0NUL
delimiter anyway, then either should work fine.In fact, in practice, in every case but one where the standard specifies a
\
backslash escaped newline should be required, I have portably found......to work just as well. And in every case - again, in practice - where a non-escaped
\n
ewline should be required......has worked for me, too. The one exception I mention above is...
...which does not work for any implementation in any of my tests. I'm fairly sure that falls back to the text file requirement and the fact that
s///
comes with a delimiter and so there is no reason a single statement should span\0NUL
delimited arguments.So, in conclusion, here is a short rundown of portable ways to write several kinds of
sed
commands:For any of
[aic]
:...or...
For any of
[:rwtb]
where the parameter is optional (for all but:
) but the delimiting\n
ewline is not. Note that I have never had a reason to try multiple line label parameters as would be used with[:tb]
, but thatw
riting/r
eading to multiple lines in [rw]file parameters is usually accepted without question bysed
s I have tested so long as the embedded\n
ewline is escaped w/ a\
backslash. Still, the standard does not directly specify that label and [rw]file parameters should be parsed identically to text parameters and makes no mention of\n
ewlines regarding the first two except as it delimits them....or...
...where the
<space>
above is optional for[:tb]
.And last...
...or...
...where any of the aforementioned commands (excepting
:
) also accept at least one address and which can be either a/
regexp/
or a line number and might be negated with!
, but if more than one command is necessary for a single evaluation of address then{
function context}
delimiting braces must be used. A function can contain even multiple\n
ewline delimited commands, but each must be delimited within the braces as it would be otherwise.And that's how to write portable
sed
scripts.