This is a sed
-specific question; I am well aware it could be done with other tools but I am working on expanding my knowledge of sed
.
How can I use sed
to globally quote (actually backtick) a word that is not specified in the script? The word is held in the hold space.
What I want is something like:
s/word/`&`/g
But the trick is, word
will be contained not in the sed script but in the hold space. So it looks something more like:
H
g
s/^\(.*\)\n\(.*\)\1\(.*\)$/\2`\1`\3/
which will quote one occurrence of the word held in the hold space. I want to quote all of them, but I can't just add a g
flag, because of the way this uses backreferences rather than a static regex.
H
g
s/^\(.*\)\n\(.*\)\1\(.*\)\1\(.*\)$/\2`\1`\3`\1`\4/
This handles two occurrences of the word, but fails on one, and ignores more than one.
I thought I could use something clean and simple like:
s//`&`/g
But that reuses the last used regex, not what it matches. (Which makes sense.)
Is there any way in sed
to do what I am trying to do? (Actually I would be interested in seeing how easy this would be in perl
, but I would still like to see how to do it in sed
.)
UPDATE
Not that it's needed for this question, but I thought I would give a little more context on what exactly I was doing when I came up with this question:
I had a big text file of documentation, certain parts of which needed to be condensed and summarized into an asciidoc
table. It was pretty easy because of the Description:
and Prototype:
lines, etc., so I actually wrote a quick sed
script to do all the parsing for me. It worked beautifully—but the one thing it was missing was that I wanted to backtick the words in the Description
line that matched the arguments listed in the Prototype
line. The prototype lines looked something like this:
Prototype: some_words_here(and, arg, list,here)
There were upwards of 200 different entries in the table I was outputting (and the source documentation included a lot more text than that) and each arglist only needed to be used to backtick-quote matching words on a single line. To make things trickier, some of the args were not in the Description line, some were in more than once, and some arglists were empty().
However, given that sometimes an arg would match a part of a word, which I didn't want to get backticked, and sometimes an arg name was a common word (like from
) which I only wanted to get backticked when it was used in the context of explaining the use of the function, an automated solution wasn't actually a good fit at all and I instead used vim
to do the job semi-manually, with the help of some tricky macros. 🙂
Best Answer
That was a hard one. Assuming you have a
file
like this:Where:
`word`
.The
sed
command:Explanation:
1h;
save the first line to the hold space (this is wait we want to search for).word
2{...}
applies to the second line.x;
exchange the pattern space and the hold space.G;
append the hold space to the pattern space. In the pattern space we have now::l;
set a label calledl
as point for later.s///
do the actual search/replace in the pattern space mentioned above:^\([^\n]\+\)\n
search in the "pattern line" for all characters (from the beginning of the line^
) which are not a newline[^\n]
(one or more times\+
), until a newline\n
. This is now stored in the back-reference\1
. It contains the "pattern line".(.*[^`])
search for any character.*
followed by a character, which is not a backtick[^`]
. This is stored in\2
.\2
contains now:line with a word and words and wording wordy
, until the last occurence ofword
, because...\1
is the next search term (the back-reference\1
,word
), hence what the "pattern line" contains.([^`])
this is followed by another character which is not a backtick; saved to reference\3
. If we don't do this (and the part in\2
from above), we would end of in an endless loop quoting the sameword
, again and again ->````word````
, becauses///
would always be successful andtl;
jumps back to:l
(seetl;
further down).\1\n\2
all of the above is replaced by the back-references. The second\1
\3\1
is the one we should quote (note the first reference is the "pattern line").tl;
if thes///
was successful (we replaced something) jump to the label calledl
and start again until there is nothing more to search and replace. This is the case, when all occurences of word are replaced/quoted.p;
when all is done, print the altered line (pattern space).The output: