Shell Command – Missing Trailing Newline Character in Command Substitution

command linecommand-substitutionshelltext processing

The following code best describes the situation. Why is the last line not outputting the trailing newline char? Each line's output is shown in the comment. I'm using GNU bash, version 4.1.5

     echo -n $'a\nb\n'                  | xxd -p  # 610a620a  
           x=$'a\nb\n'   ; echo -n "$x" | xxd -p  # 610a620a
     echo -ne "a\nb\n"                  | xxd -p  # 610a620a
x="$(echo -ne "a\nb\n")" ; echo -n "$x" | xxd -p  # 610a62

Best Answer

The command substitution function $() (and its cousin the backtick) specifically removes trailing newlines. This is the documented behavior, and you should always be aware of it when using the construct.

Newlines inside the text body are not removed by the substitution operator, but they may also be removed when doing word splitting on the shell, so how that turns out depends on whether you used quotes or not. Note the difference between these two usages:

$ echo -n "$(echo -n 'a\nb')"
a
b

$ !! | xxd -p
610a62

$ echo -n  $(echo -n 'a\nb')
a b

$ !! | xxd -p   
612062

In the second example, the output wasn't quoted and the newline was interpreted as a word-split, making it show up in the output as a space!

Related Solutions

Where has the `uniq` or `sort -u` line gone, with some unicode characters

Short version: collation doesn't really work in command line utilities.

Longer version: the underlying function to compare two strings is strcoll. The description isn't very helpful, but the conceptual method of operation is to convert both strings to a canonical form, and then compare the two canonical forms. The function strxfrm constructs this canonical form.

Let's observe the canonical forms of a few strings (with GNU libc, under Debian squeeze):

$ export LC_ALL=en_US.UTF-8
$ perl -C255 -MPOSIX -le 'print "$_ ", unpack("h*", strxfrm($_)) foreach @ARGV' b a A à 〼 〇
b d010801020
a c010801020
A c010801090
à 101010102c6b
〼 101010102c6b102c6b102c6b
〇 101010102c6b102c6b102c6b

As you can see, 〼 and 〇 have the same canonical form. I think that's because these characters are not mentioned in the collation tables of the en_US.UTF-8 locale. They are, however, present in a Japanese locale.

$ export LC_ALL=ja_JP.UTF-8
$ perl -C255 -MPOSIX -le 'print "$_ ", unpack("h*", strxfrm($_)) foreach @ARGV' 〼 〇 
〼 303030
〇 3c9b

The source code for the locale data (in Debian squeeze) is in /usr/share/i18n/locales/en_US, which includes /usr/share/i18n/locales/iso14651_t1_common. This file doesn't have an entry for U3007 or U303C, nor are they included in any range that I can find.

I'm not familiar with the rules to build the collation order, but from what I understand, the relevant phrasing is

The symbol UNDEFINED shall be interpreted as including all coded character set values not specified explicitly or via the ellipsis symbol. (…) If no UNDEFINED symbol is specified, and the current coded character set contains characters not specified in this section, the utility shall issue a warning message and place such characters at the end of the character collation order.

It looks like Glibc is instead ignoring characters that aren't specified. I don't know if there's a flaw of my understanding of the POSIX spec, if I missed something in Glibc's locale definition, or if there's a bug in the Glibc locale compiler.

Shell – Why Does Command Substitution Remove Trailing Newline?

Because the shell was not originally intended to be a full programming language.

It is quite difficult to remove a trailing \n from some command output. However, for display purposes, almost all commands end their output with \n, so… there has to be a simple way to remove it when you want to use it in another command. Automatic removal with the $() construction was the chosen solution.

So, maybe you'll accept this question as an answer:

Can you find a simple way to remove the trailing \n if this was not done automatically in the following command?

> echo The current date is "$(date)", have a good day!

Note that quoting is required to prevent smashing of double spaces that may appear in formatted dates.

Best Answer

Related Solutions

Where has the `uniq` or `sort -u` line gone, with some unicode characters

Shell – Why Does Command Substitution Remove Trailing Newline?

Related Question