What is a simple way to find matching consecutive pairs of parentheses and replace them with their enclosing content using sed/awk in bash?
A minimal example would be:
Input:
(body1)
Output:
body1
Insufficient solution:
This could be done with
echo "(body1)" | sed 's/[()]//g'
Extended problem
But simply removing all opening/closing parentheses will not suffice, since the ultimate goal is to remove certain, not all (tex) commands from a source file, such as
Input:
Alea {\color{red}iacta} est. \textbf{Hic} forum est, populus {\color{red}properant}.
Output:
Alea iacta est. \textbf{Hic} forum est, populus properant.
So far I only managed to extract the text with:
awk -v FS="({\\color{red}|})" '{print $2}' $file.tex
Bonus
with sed -E 's/\{\\color\{red}([^{}]*)\}/\1/g'
it is possible to remove only the \color{red} command – however, start and end of the command need to be on the same line.
How to remove a command that spans multiple lines before the closing parenthesis }
?
Bonus Solution
If someone is interested, the following commands seem to solve the bonus problem:
sed -i -r 's#\{\\color\{red\}([^}]*)\}#\1#g' $file.tex
sed -i -r ':a;N;$!ba;s#\{\\color\{red\}([^}]*)\}#\1#' $file.tex
The first command removes all pairs of {\color{red}
and }
in a single line. The second command removes all pairs that span multiple lines.
Best Answer
Even the simple question you're starting with hides some complexity. I'd start with
repeated until there are no parenthesis pairs. This replaces the innermost text:
As suggested by Kusalananda though, to strip TeX commands you should check out
detex
which is available in TeX Live (and in most distributions). Such processing requires more than matching parentheses or braces: you need to know a little about various commands' behaviour. Even in your example,\color
needs to be processed one way,\textbf
another...