Replace matching parentheses with enclosing content

awkperlsedtext formattingtext processing

What is a simple way to find matching consecutive pairs of parentheses and replace them with their enclosing content using sed/awk in bash?

A minimal example would be:

Input:

(body1)

Output:

body1

Insufficient solution:

This could be done with

echo "(body1)" | sed 's/[()]//g'

Extended problem

But simply removing all opening/closing parentheses will not suffice, since the ultimate goal is to remove certain, not all (tex) commands from a source file, such as

Input:

Alea {\color{red}iacta} est. \textbf{Hic} forum est, populus {\color{red}properant}.

Output:

Alea iacta est. \textbf{Hic} forum est, populus properant.

So far I only managed to extract the text with:

awk -v FS="({\\color{red}|})" '{print $2}' $file.tex

Bonus

with sed -E 's/\{\\color\{red}([^{}]*)\}/\1/g' it is possible to remove only the \color{red} command – however, start and end of the command need to be on the same line.

How to remove a command that spans multiple lines before the closing parenthesis }?

Bonus Solution

If someone is interested, the following commands seem to solve the bonus problem:

sed -i -r 's#\{\\color\{red\}([^}]*)\}#\1#g' $file.tex
sed -i -r ':a;N;$!ba;s#\{\\color\{red\}([^}]*)\}#\1#' $file.tex

The first command removes all pairs of {\color{red} and } in a single line. The second command removes all pairs that span multiple lines.

Best Answer

Even the simple question you're starting with hides some complexity. I'd start with

sed -E 's/\(([^()]*)\)/\1/'

repeated until there are no parenthesis pairs. This replaces the innermost text:

$ echo "((body))" | sed -E 's/\(([^()]*)\)/\1/'
(body)

As suggested by Kusalananda though, to strip TeX commands you should check out detex which is available in TeX Live (and in most distributions). Such processing requires more than matching parentheses or braces: you need to know a little about various commands' behaviour. Even in your example, \color needs to be processed one way, \textbf another...

Related Question