Sed – How to Match Curly Braces {} with Sed

regular expressionsed

I want to remove \includegraphics from .tex files in order to get a list of the filenames as illustrated in the following example. I want to remove x and y and get I

something {\includegraphics[width=0.5\textwidth]{/tmp/myfile.pdf} somethingelse
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxIIIIIIIIIIIIIIIyyyyyyyyyyyyyyy

The following example does not work with GNU sed 4.5. How should I escape the braces properly, so that it matches properly?

echo "something {\includegraphics[width=0.5\textwidth]{" | sed -e "s/^*.\\includegraphics\[*.\]\{//"

Best Answer

Don't escape the { or }. Doing so would make sed think you are using a regular expression repetition operator (as in \{1,4\} to match the previous expression between one and four times). This is a basic regular expression operator, and the extended regular expression equivalent is written without the backslashes.

In an extended regular expression (as used with sed -E), you do want to escape both { and }. If you find it hard to remember when to escape and when to not escape these characters, you may always use [{] and [}] to match them literally in both basic and extended expressions.

You also use *. in two places where I think you mean .*. Incidentally, a * at the start of a regular expression (or just after ^ at the start) would match a literal * character.

As for the actual sed command, I would probably use the following:

sed 's/.*\\includegraphics.*{\([^}]*\)}.*/\1/' file.tex

To delete all lines that does not contain any \includegraphics command, you could add a simple d command:

sed -e '/\\includegraphics/!d' \
    -e 's/.*\\includegraphics.*{\([^}]*\)}.*/\1/' file.tex

This would work on your example, but not if the somethingelse at the end of the line contains a { character.

Related Question