Gnu Sed – Does the p Command Append a Newline When Printing

sed

root@u1804:~# sed --version
sed (GNU sed) 4.5
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Jay Fenlason, Tom Lord, Ken Pizzini,
and Paolo Bonzini.
GNU sed home page: <https://www.gnu.org/software/sed/>.
General help using GNU software: <https://www.gnu.org/gethelp/>.
E-mail bug reports to: <bug-sed@gnu.org>.
root@u1804:~#

I'm new to sed and I created below sed's workflow based on my understanding (correct me if you find anything wrong).

enter image description here

So it seems the default auto printing of the pattern space will always include a newline at the end. My question is, will p includes a newline, too? I have below examples.

root@u1804:~# seq 3 | sed -rn 'p'
1
2
3
root@u1804:

Here the newline at the end of each number is added by sed itself (see the diagram "adds back newline to pattern space"). So it seems p will not append a newline. However, see below example.

root@u1804:~# seq 3 | sed -rn 'x;p;x;p'

1

2

3
root@u1804:~#

Here x exchange pattern space with hold space, which will result in an empty pattern space. Now p applies to the pattern space (nothing in it) should print nothing. But based on the result, it seems here p prints a newline. To me it seems this is inconsistent behavior. Can anyone explain?

Best Answer

To answer your main question:

GNU sed will append a <newline> character when executing the p command unless the input line was missing its terminating <newline> character (see the clarifications about lines below).

As far as I can tell, sed's p flag and its auto-print feature implement the same logic to output the pattern space: if the trailing <newline> character was removed, they add it back; otherwise they don't.

Examples:

$ printf '%s\n%s' '4' '5' | sed ';' | hexdump -C      # auto-print
00000000  34 0a 35                                          |4.5|
00000003
$ printf '%s\n%s' '4' '5' | sed -n 'p;' | hexdump -C  # no auto-print; p flag
00000000  34 0a 35                                          |4.5|
00000003

In both cases there is no <newline> character (0a) in the output for the input lines that don't have one.


About your diagrams:

"Adds back newline to pattern space" is probably inaccurate because the <newline> character is not put in the pattern space1. Also, that step is not related to the -n option - but this does not make the diagram wrong; rather, it should probably be merged into "Print pattern space".
Still, I agree with you about the documentation's lack of clarity.

1 The sentence you quote in your own answer, "the contents of pattern space are printed out to the output stream, adding back the trailing newline if it was removed", means that the <newline> is appended to the stream, not to pattern space. Of course, since pattern space is cleared in a short while, this is a really minor point


About your tests involving the x flag:

Internally, pattern space and hold space are structures, and "was my trailing <newline> character dropped?" is a member of them. We will call it chomped (as it is named in sed's source code, by the way).
Pattern space is filled with a read line and its chomped attribute depends on how that line is terminated: true if it ends with a <newline> character, false otherwise. On the other hand, hold space is initialized as empty and its chomped attributed is just set to true.
Therefore, when you swap pattern space and hold space and print what was born as hold and is now pattern, a <newline> character is printed.

Examples - these commands have the same output:

$ printf '\n' | sed -n 'p;' | hexdump -C        # input is only a <newline>
00000000  0a                                                |.|
00000001
$ printf '%s' '5' | sed -n 'x;p;' | hexdump -C  # input has no <newline>
00000000  0a                                                |.|
00000001

(I gave only a really brief look at sed's code, so this might well be not accurate).


About lines (clarification started with comments to your answer):

It goes without saying that a line without a terminating <newline> character is a problematic concept. Quoting POSIX:

3.206 Line
A sequence of zero or more non- <newline> characters plus a terminating <newline> character.

Furthermore, POSIX defines a text file:

3.403 Text File
A file that contains characters organized into zero or more lines. ...

Finally, POSIX on sed (bold mine):

DESCRIPTION
The sed utility is a stream editor that shall read one or more text files, make editing changes according to a script of editing commands, and write the results to standard output. ...

GNU sed, though, seems to be less strict when defining its input:

sed is a stream editor. A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline). ...

So, relating to my first sentence, we should take into account that, for GNU sed, what is read into the pattern space doesn't necessarily have to be a well formed line of text.

Related Question