Confused by sed output when using N. Can someone explain these results

sed

I am learning sed. Everything seemed to be going fine until I come across the N (multi-line next). I created this file (guide.txt) for practice/understanding/context purposes. Here is the contents of said file…

This guide is meant to walk you through a day as a Network
Administrator. By the end, hopefully you will be better
equipped to perform your duties as a Network Administrator
and maybe even enjoy being a Network Administrator that much more.
Network Administrator
Network Administrator
I'm a Network Administrator

So my goal is to substitute ALL instances of "Network Administrator" with "System User". Because the first instance of "Network Administrator" is separated by a newline (\n) I need the multi-line next operator (N) to append the line that starts with "Administrator" with the previous line ending with "Network\n". No problem. But I also want to catch all the other "Network Administrator" single-line instances.

From my research, I've learned that I will need two substitution commands; one for the newline separated string and one for the others. Also, there is some jive happening because of the last line containing the substitution match and the multi-line next. So I craft this…

$ sed '
> s/Network Administrator/System User/
> N
> s/Network\nAdministrator/System\nUser/
> ' guide.txt

This returns these results…

This guide is meant to walk you through a day as a System
User. By the end, hopefully you will be better
equipped to perform your duties as a System User
and maybe even enjoy being a Network Administrator that much more.
System User
Network Administrator
I'm a System User

I thought that the single-line substitution would catch all the "normal" instances of "Network Administrator" and swap it out for "System User", while the multi-line statement would work its magic on the newline separated instance, but as you can see it returned, what I consider, unexpected results.

After some fiddling, I landed on this…

$ sed '
> s/Network Administrator/System User/
> N
> s/Network\nAdministrator/System\nUser/
> s/Network Administrator/System User/
> ' guide.txt

And voilà, I get the desired output of…

This guide is meant to walk you through a day as a System
User. By the end, hopefully you will be better
equipped to perform your duties as a System User
and maybe even enjoy being a System User that much more.
System User
System User
I'm a System User

Why does this work and the original sed script doesn't? I really want to understand this.

Thanks in advance for any help.

Best Answer

As you are learning sed, I'll take the time to add to @John1024's answer:

1) Please note that you are using \n in the replacement string. This works in GNU sed, but is not part of POSIX, so it will insert a backslash and an n in many other seds (using \n in the pattern is portable, btw).

Instead of this I suggest to do s/Network\([[:space:]]\)Administrator/System\1Us‌​er/g: The [[:space:]] will match newline or whitespace, so you don't need two s commands, but combine them in one. By surrounding it with \(...\) you can refer to it in the replacement: The \1 will get replaced by whatever was matched in the first pair of \(\).

2) To properly match patterns over two lines, you should know the N;P;D pattern:

 sed '$!N;s/Network\([[:space:]]\)Administrator/System\1User/g;P;D'

The N is always append the next line (except for the last line, that's why it's "addressed" with $! (=if not last line; you should always consider to preceed N with $! to avoid accidentally ending the script). Then after the replacement the P prints only the first line in the pattern space and the D deletes this line and starts the next cycle with the remains of the pattern space (without reading the next line). This is probably what you originally intended.

Remember this pattern, you will often need it.

3) Another useful pattern for multiline editing, especially when more than two lines are involved: Hold space collecting, as I suggested to John:

sed 'H;1h;$!d;g;s/Network\([[:space:]]\)Administrator/System\1Us‌​er/g'

I repeat it to explain it: H appends each line to the hold space. As this would result in an extra newline before the first line, the first line needs to be moved instead of appended with 1h. The following $!d means "for all lines except the last one, delete the pattern space and start over". Thus, the rest of the script is only executed for the last line. At this point, the whole file is collected in the hold space (so don't use this for very large files!) and the g moves it to the pattern space, so you can do all replacements at once like you can with the -z option of GNU sed.

This is another useful pattern I suggest to keep in mind.

Related Question