BASH doesn’t like the regex

bashregular expression

I'm trying to get the 2-digit month and 2-digit year a file was modified, but's it's not working..

modified=$(stat -c %y "$line"); 
# modified="2018-08-22 14:39:36.400469308 -0400"
if [[ $modified =~ ".{2}(\d{2})-(\d{2})" ]]; then
    echo ${BASH_REMATCH[0]}
    echo ${BASH_REMATCH[1]
fi

What am I doing wrong?

Best Answer

First, the quotes suppress the meaning of the special characters in the regex (online manual):

An additional binary operator, =~, is available, ... Any part of the pattern may be quoted to force the quoted portion to be matched as a string. ... If you want to match a character that’s special to the regular expression grammar, it has to be quoted to remove its special meaning.

The manual goes on to recommend putting the regex in a variable to prevent some clashes between the shell parsing and the regex syntax.

Second, \d doesn't do what you think it does, but just matches a literal d.

Also note that ${BASH_REMATCH[0]} contains the whole matching string, and indexes 1 and up contain the captured groups.

I'd also strongly suggest using four-digit years, so:

modified=$(stat -c %y "$file")
re='^([0-9]{4})-([0-9]{2})'
if [[ $modified =~ $re ]]; then
    echo "year:  ${BASH_REMATCH[1]}"
    echo "month: ${BASH_REMATCH[2]}"
else
    echo "invalid timestamp"
fi

For a file modified today, that gives year: 2018 and month: 08. Note that numbers with a leading zero will be considered octal by the shell and possibly other utilities.

(Four-digit years have less issues if you ever need to handles dates from the 1900's, and they're easier to recognize as years and not days of month.)

Related Question