Shell Script – Extract Text Between Triple Single Quotes

shell-scripttext processing

I have the following in a file

description: '''
        This rule forbids throwing string literals or interpolations. While
        JavaScript (and CoffeeScript by extension) allow any expression to
        be thrown, it is best to only throw <a
        href="https://developer.mozilla.org
        /en/JavaScript/Reference/Global_Objects/Error"> Error</a> objects,
        because they contain valuable debugging information like the stack
        trace. Because of JavaScript's dynamic nature, CoffeeLint cannot
        ensure you are always throwing instances of <tt>Error</tt>. It will
        only catch the simple but real case of throwing literal strings.
        <pre>
        <code># CoffeeLint will catch this:
        throw "i made a boo boo"

        # ... but not this:
        throw getSomeString()
        </code>
        </pre>
        This rule is enabled by default.
        '''

with several other things in this file.

I extract this part in my shell script via sed -n "/'''/,/'''/p" $1 (where $1 is the file).

This gives me a variable with the content as one liner

description: ''' This rule forbids throwing string literals or interpolations. While JavaScript (and CoffeeScript by extension) allow any expression to be thrown, it is best to only throw <a href="https://developer.mozilla.org /en/JavaScript/Reference/Global_Objects/Error"> Error</a> objects, because they contain valuable debugging information like the stack trace. Because of JavaScript's dynamic nature, CoffeeLint cannot ensure you are always throwing instances of <tt>Error</tt>. It will only catch the simple but real case of throwing literal strings. <pre> <code># CoffeeLint will catch this: throw "i made a boo boo" # ... but not this: throw getSomeString() </code> </pre> This rule is enabled by default. '''

How can I now extract the part between the ''' ?

Or is there even a better way to retrieve it from the multiline file ?

I'm on Mac El Captain 10.11.2 and GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin15)

Best Answer

perl -l -0777 -ne "print for /'''(.*?)'''/gs" file

would extract (and print followed by a newline) the part between each pair of '''.

Beware that perl slurps the whole file in memory before starting processing it so that solution may not be appropriate for very large files.

Related Question