How to search for the word stored in the hold space with sed

perlregular expressionsedtext processing

This is a sed-specific question; I am well aware it could be done with other tools but I am working on expanding my knowledge of sed.

How can I use sed to globally quote (actually backtick) a word that is not specified in the script? The word is held in the hold space.

What I want is something like:

s/word/`&`/g

But the trick is, word will be contained not in the sed script but in the hold space. So it looks something more like:

H
g
s/^\(.*\)\n\(.*\)\1\(.*\)$/\2`\1`\3/

which will quote one occurrence of the word held in the hold space. I want to quote all of them, but I can't just add a g flag, because of the way this uses backreferences rather than a static regex.

H
g
s/^\(.*\)\n\(.*\)\1\(.*\)\1\(.*\)$/\2`\1`\3`\1`\4/

This handles two occurrences of the word, but fails on one, and ignores more than one.

I thought I could use something clean and simple like:

s//`&`/g

But that reuses the last used regex, not what it matches. (Which makes sense.)

Is there any way in sed to do what I am trying to do? (Actually I would be interested in seeing how easy this would be in perl, but I would still like to see how to do it in sed.)


UPDATE

Not that it's needed for this question, but I thought I would give a little more context on what exactly I was doing when I came up with this question:

I had a big text file of documentation, certain parts of which needed to be condensed and summarized into an asciidoc table. It was pretty easy because of the Description: and Prototype: lines, etc., so I actually wrote a quick sed script to do all the parsing for me. It worked beautifully—but the one thing it was missing was that I wanted to backtick the words in the Description line that matched the arguments listed in the Prototype line. The prototype lines looked something like this:

Prototype: some_words_here(and, arg, list,here)

There were upwards of 200 different entries in the table I was outputting (and the source documentation included a lot more text than that) and each arglist only needed to be used to backtick-quote matching words on a single line. To make things trickier, some of the args were not in the Description line, some were in more than once, and some arglists were empty().

However, given that sometimes an arg would match a part of a word, which I didn't want to get backticked, and sometimes an arg name was a common word (like from) which I only wanted to get backticked when it was used in the context of explaining the use of the function, an automated solution wasn't actually a good fit at all and I instead used vim to do the job semi-manually, with the help of some tricky macros. 🙂

Best Answer

That was a hard one. Assuming you have a file like this:

$ cat file
word
line with a word and words and wording wordy words.

Where:

  • Line 1: is the search pattern that should be held in the hold space and quoted to `word`.
  • Line 2: is the line to seach and replace globally.

The sed command:

sed -n '1h; 2{x;G;:l;s/^\([^\n]\+\)\n\(.*[^`]\)\1\([^`]\)/\1\n\2`\1`\3/;tl;p}' file

Explanation:

  • 1h; save the first line to the hold space (this is wait we want to search for).
    • hold space contains: word
  • 2{...} applies to the second line.
  • x; exchange the pattern space and the hold space.
  • G; append the hold space to the pattern space. In the pattern space we have now:
word # I will call this line the "pattern line" from now on
line with a word and words and wording wordy words.
  • :l; set a label called l as point for later.
  • s/// do the actual search/replace in the pattern space mentioned above:
    • ^\([^\n]\+\)\n search in the "pattern line" for all characters (from the beginning of the line ^) which are not a newline [^\n] (one or more times \+), until a newline \n. This is now stored in the back-reference \1. It contains the "pattern line".
    • (.*[^`]) search for any character .* followed by a character, which is not a backtick [^`]. This is stored in \2. \2 contains now: line with a word and words and wording wordy, until the last occurence of word, because...
    • \1 is the next search term (the back-reference \1, word), hence what the "pattern line" contains.
    • ([^`]) this is followed by another character which is not a backtick; saved to reference \3. If we don't do this (and the part in \2 from above), we would end of in an endless loop quoting the same word, again and again -> ````word````, because s/// would always be successful and tl; jumps back to :l (see tl; further down).
    • \1\n\2\1\3 all of the above is replaced by the back-references. The second \1 is the one we should quote (note the first reference is the "pattern line").
  • tl; if the s/// was successful (we replaced something) jump to the label called l and start again until there is nothing more to search and replace. This is the case, when all occurences of word are replaced/quoted.
  • p; when all is done, print the altered line (pattern space).

The output:

$ sed -n '1h; 2{x;G;:l;s/^\([^\n]\+\)\n\(.*[^`]\)\1\([^`]\)/\1\n\2`\1`\3/;tl;p}' file
word
line with a `word` and `word`s and `word`ing `word`y `word`s.
Related Question