I am trying to add missing quotes at the ends of some lines in a text file.
I find that the regex [^\"]$ suffices to find lines with missing terminal doublequotes and so tried the following replacement using a backreference (which tbh I've never used before). Using parens around the 'capture group' I hoped that sed would allow backreference to that group, but
sed 's|([^\"]$)|\1\"|g' bigfile.tsv
hits
sed: -e expression #1, char 17: invalid reference \1 on `s' command's RHS
and same if I don't escape the replacement quotes
sed 's|([^\"]$)|\1"|g' bigfile.tsv
(tho now its char 16 that's offensive) . How does the backreference go? https://xkcd.com/1171/
Best Answer
When you run
sed
without-E
, then the expression is a basic regular expression and the capture groups must be written as\(...\)
. When you use-E
to enable extended regular expressions, capture groups are written(...)
.The
\
inside[...]
is literal, so your expression would also avoid adding a double quote on lines ending with\
. Some of the other escaping is also unnecessary.Therefore, you may write your
sed
command asor as
Or, using
&
:The
&
in the replacement part of the expression will be substituted by the part of the input that matched the regular expression.A couple of other alternatives that does not use a capture group:
This applies
s/$/"/
to all lines that matches/[^"]$/
.Or, alternatively,
This applies
s/$/"/
to all lines that don't match/"$/
(there's a slight difference from the other approaches here in that it also adds a"
to empty lines).Note that in all cases, the
g
flag at the end is definitely not needed.