Sed to match pattern between matching curly braces

escape-charactersosxregular expressionsed

From a pattern such as

[string 1]{string 2}

I want to extract string 2, the string between the last pair of matching curly braces — that is delete [string 1] and the open { and close }. My attempt below breaks when there is a additional [, ] pairs in either string 1 or string 2.

Desired Output:

The desired output from the script below begins with foo and ends with a digit:

foo bar 1
foo bar 2
foo[3]{xyz} bar 3
foo $sq[3]{xyz}$ bar 4
foo $sq[3]{xyz}$ bar 5
foo $sq[3]{xyz}$ bar 6
foo $sq[3]{xyz}$ bar 7
foo $sq[3]{xyz}$ bar 8'
foo $sq[abc]{xyz}$ bar 9'
foo $sq[abc]{xyz}$ bar 10'

Assumptions:

  • Parameter to RemoveInitialSquareBraces always begins with a [ and ends with a }.
  • The opening [ for string 1 will have a matching ] at the point where the opening { begins for string 2.

Platform:

  • MacOS 10.9.5

Script

#!/bin/bash

function RemoveInitialSquareBraces {
    #EXTRACTED_TEXT="$(\
    #      echo "$1" \
    #    | sed 's/^\[.*\]//'              \
    #    | sed 's/{//'                    \
    #    | sed 's/}$//'                   \
    #    )"
    EXTRACTED_TEXT="$(\
          echo "$1" \
        | sed 's/.*[^0-9]\]{\(.*\)}/\1/' \
        )"
        
    echo "${EXTRACTED_TEXT}"
}

RemoveInitialSquareBraces '[]{foo bar 1}'
RemoveInitialSquareBraces '[abc]{foo bar 2}'
RemoveInitialSquareBraces '[]{foo[3]{xyz} bar 3}'
RemoveInitialSquareBraces '[]{foo $sq[3]{xyz}$ bar 4}'
RemoveInitialSquareBraces '[goo{w}]{foo $sq[3]{xyz}$ bar 5}'
RemoveInitialSquareBraces '[goo[3]{w}]{foo $sq[3]{xyz}$ bar 6}'
RemoveInitialSquareBraces '[goo[3]{w} hoo[3]{5}]{foo $sq[3]{xyz}$ bar 7}'
RemoveInitialSquareBraces '[goo[3]{w} hoo[3]{5}]{foo $sq[3]{xyz}$ bar 8}'
RemoveInitialSquareBraces '[goo[3]{w} hoo[xyz]{5}]{foo $sq[abc]{xyz}$ bar 9}'
RemoveInitialSquareBraces '[goo[3]{w} hoo[xyz]{uvw}]{foo $sq[abc]{xyz}$ bar 10}'

exit 0

Best Answer

Regarding to above input examples the script can be:

sed s/[^\"\']*[^0-9]\]{\(.*\)}/\1/ <<\END
"[]{foo bar 1}"
"[abc]{foo bar 2}"
"[]{foo[3]{xyz} bar 3}"
"[]{foo $sq[3]{xyz}$ bar 4}"
"[goo{w}]{foo $sq[3]{xyz}$ bar 5}"
"[goo[3]{w}]{foo $sq[3]{xyz}$ bar 6}"
"[goo[3]{w} hoo[3]{5}]{foo $sq[3]{xyz}$ bar 7}"
END

produces

"foo bar 1"
"foo bar 2"
"foo[3]{xyz} bar 3"
"foo $sq[3]{xyz}$ bar 4"
"foo $sq[3]{xyz}$ bar 5"
"foo $sq[3]{xyz}$ bar 6"
"foo $sq[3]{xyz}$ bar 7"

Other thing is your function which can be simplified:

function RemoveInitialSquareBraces {
    printf '%s\n' "$@" |
    sed ...
}

thus it will accept many argument(s).

Update: for more general case you can do the task in two steps:

sed -e "
s/\[.*\[.*\][^[]*\]/[]/  #remove square brackets inside square brackets
s/\[[^]]*\]{\(.*\)\}/\1/ #lazy strip square brackets and curle brackets
"

Addition: you can use perl-grep(GNU grep with perl extention):

grep -Po '\[([^][]*\[\w+\][^][]*)*\]{\K.*(?=})'

or sed with same regexp:

sed 's/\[\([^][]*\(\[\w\+\][^][]*\)*\)*\]{\(.*\)}/\3/'
Related Question