Regex to add missing quotes

regular expressionsed

I am trying to add missing quotes at the ends of some lines in a text file.

I find that the regex [^\"]$ suffices to find lines with missing terminal doublequotes and so tried the following replacement using a backreference (which tbh I've never used before). Using parens around the 'capture group' I hoped that sed would allow backreference to that group, but

sed  's|([^\"]$)|\1\"|g' bigfile.tsv

hits

sed: -e expression #1, char 17: invalid reference \1 on `s' command's RHS

and same if I don't escape the replacement quotes

sed  's|([^\"]$)|\1"|g' bigfile.tsv

(tho now its char 16 that's offensive) . How does the backreference go? https://xkcd.com/1171/

Best Answer

When you run sed without -E, then the expression is a basic regular expression and the capture groups must be written as \(...\). When you use -E to enable extended regular expressions, capture groups are written (...).

The \ inside [...] is literal, so your expression would also avoid adding a double quote on lines ending with \. Some of the other escaping is also unnecessary.

Therefore, you may write your sed command as

sed 's/\([^"]\)$/\1"/'

or as

sed -E 's/([^"])$/\1"/'

Or, using &:

sed 's/[^"]$/&"/'

The & in the replacement part of the expression will be substituted by the part of the input that matched the regular expression.

A couple of other alternatives that does not use a capture group:

sed '/[^"]$/ s/$/"/'

This applies s/$/"/ to all lines that matches /[^"]$/.

Or, alternatively,

sed '/"$/ !s/$/"/'

This applies s/$/"/ to all lines that don't match /"$/ (there's a slight difference from the other approaches here in that it also adds a " to empty lines).

Note that in all cases, the g flag at the end is definitely not needed.

Related Question