With the bash
shell, in a file with rows like the following ones
first "line"
<second>line and so on
I would like to replace one or more occurrences of "line"\n<second>
with other characters
and obtain each time:
first other characters line and so on
So I have to replace a string both with special characters such as "
and <
and with a newline character.
After searching between the other answers, I found that sed
can accept newlines in the right-hand side of the command (so, the other characters
string), but not in the left.
Is there a way (simpler than this) to obtain this result with sed
or grep
?
Best Answer
Three different
sed
commands:They all three build on the basic
s///
ubstitution command:They also all try to take care in their handling of the last line, as
sed
s tend to differ on their output in edge cases. This is the meaning of$!
which is an address matching every line that is!
not the$
last.They also all use the
N
ext command to append the next input line to pattern space following a\n
ewline character. Anyone who has beensed
ing for a while will have learned to rely on the\n
ewline character - because the only way to get one is to explicitly put it there.All three make some attempt to read in as little input as possible before taking action -
sed
acts as soon as it might and needn't read in an entire input file before doing so.Though they do all
N
, they all three differ in their methods of recursion.First Command
The first command employs a very simple
N;P;D
loop. These three commands are built-in to any POSIX-compatiblesed
and they complement one another nicely.N
- as already mentioned, appends theN
ext input line to pattern-space following an inserted\n
ewline delimiter.P
- likep
; itP
rints pattern-space - but only up-to the first occurring\n
ewline character. And so, given the following input/command:printf %s\\n one two | sed '$!N;P;d'
sed
P
rints only one. However, with...D
- liked
; itD
eletes pattern-space and begins another line-cycle. Unliked
,D
deletes only up to the first occurring\n
ewline in pattern-space. If there is more in pattern-space following\n
ewline character,sed
begins the next line cycle with what remains. If thed
in the previous example were replaced with aD
, for example,sed
wouldP
rint both one and two.This command recurses only for lines which do not match the
s///
ubstitution statement. Because thes///
ubstitution removes the\n
ewline added withN
, there is never anything remaining whensed
D
eletes pattern-space.Tests could be done to apply the
P
and/orD
selectively, but there are other commands which fit better with that strategy. Because the recursion is implemented to handle consecutive lines which match only part of the replacement rule, consecutive sequences of lines matching both ends of thes///
ubstitution do not work well.:Given this input:
...it prints...
It does, however, handle
...just fine.
Second Command
This command is very similar to the third. Both employ a
:b
ranch/t
est label (as is also demonstrated in Joeseph R.'s answer here) and recurse back to it given certain conditions.-e :n -e
- portablesed
scripts will delimit a:
label definition with either a\n
ewline or a new inline-e
xecution statement.:n
- defines a label namedn
. This can be returned to at any time with eitherbn
ortn
.tn
- thet
est command returns to a specified label (or, if none is provided, quits the script for the current line-cycle) if anys///
ubstitution since either the label was defined or since it was last calledt
ests successful.In this command the recursion occurs for the matching lines. If
sed
successfully replaces the pattern with other characters,sed
returns to the:n
label and tries again. If as///
ubstitution is not performedsed
autoprints pattern-space and begins the next line-cycle.This tends to handle consecutive sequences better. Where the last one failed, this prints:
Third Command
As mentioned, the logic here is very similar to the last, but the test is more explicit.
/"$/bn
- this issed
's test. Because theb
ranch command is a function of this address,sed
will onlyb
ranch back to:n
after a\n
ewline is appended and pattern-space still ends with a"
double-quote.There is as little done between
N
andb
as possible - in this waysed
can very quickly gather exactly as much input as necessary to ensure that the following line cannot match your rule. Thes///
ubstitution differs here in that it employs theg
lobal flag - and so it will do all necessary replacements at once. Given identical input this command outputs identically to the last.