Remove a specific latex command from the text AND closing bracket behind it

awklatexperlsedtext processing

How to remove a specific latex command from the text AND closing bracket behind it, but to keep the text inside the brackets? The command in the following example to remove is \edit{<some staff>}. \edit{ and } should be removed, whereas <some staff> should be left unchanged.

Please fill free to suggest SED, or AWK, or Perl or whatever will do the job

senseless example:

We \edit{Introduce a} model for analyzing \emph{data} from various
experimental designs, \edit{such as paired or \url{http://www/}
longitudinal; as was done 1984 by NN \cite{mycitation} and by NNN
\cite{mycitation2}}.

Note that there might be one or more latex commands in the form \command{smth} inside \edit{} statements. \command{smth} should be left as it was

Output:

We Introduce a model for analyzing \emph{data} from various
experimental designs, such as paired or \url{http://www/}
longitudinal; as was done 1984 by NN \cite{mycitation} and by NNN
\cite{mycitation2}.

PS. I am introducing a lot of small edits into my tex file. I want those edits to be highlighted, so my collaborator can see them. But afterwards I would like to remove all highlights and to send the text to a reviewer.

The question was originally asked at AWK/SED Remove a specific latex command from the text AND closing bracket behind it. But example there was too soft

Best Answer

Here's one that works in the simple case of only one level of commands within an \edit{...}, at maximum:

perl -00 -lpe 's,\\edit\{( (?: [^}\\]* | \\[a-z]+\{[^}]*\} )+ )\},$1,xg'

The middle part (?: [^}\\]* | \\[a-z]+\{[^}]*\} )+ has to alternatives: [^}\\]* matches any string with no closing brace or backslash (regular text); and \\[a-z]+\{[^}]*\} matches anything with backslash, lowercase letters, and then a matched pair of braces (like \url{whatever...}). The grouping (?:...)+ repeats those alternatives, and the outer parenthesis capture, so we can replace the match with just the part inside \edit{...}.

-00 tells Perl to handle the input one paragraph at time, with empty lines separating paragraphs. If you need to handle tags that span paragraphs, change that to -0 to handle the whole input in one go.

For your example, this seems to work, giving:

We Introduce a model for analyzing \emph{data} from various
experimental designs, such as paired or \url{http://www/}
longitudinal; as was done 1984 by NN \cite{mycitation} and by NNN
\cite{mycitation2}.

However, it (predictably) fails for an input with two levels of commands inside the \edit{...}:

Some \edit{\somecmd{\emph{nested} commands} here}.

Turns to:

Some \somecmd{\emph{nested} commands here}.

(the wrong closing brace is removed)


Actually handling balanced parenthesis is somewhat more tricky, it's discussed e.g. in this question on SO: Perl regular expression: match nested brackets.

Related Question