I've been asking 1 hour ago a similar question about regular expression using the grep command, pardon me if the prefered choice would have been to post in the same thread, if this is the case I would do so next time.
It might seems like basic synthax, but I'm trying to understand how regular expression recognition pattern works and the results I get seems to be contradicting the manual I'm reading about them ( I'm most likely not interpreting the material properly).
A files contains the following list of words:
mael@mael-HP:~/repertoireVide$ cat MySQLServ
remembré
emmuré
emmené
dilemmes
jumeaux
écrémage
emmena
emmailloter
flemmard
The following command gives the output
mael@mael-HP:~/repertoireVide$ grep -r 'emm*[a-f].[^ta]$'
MySQLServ:remembré
MySQLServ:emmené
MySQLServ:flemmard
I'm wondering why grep
is not matching the word 'emmailloter', since 'emmailloter':
- contains 'em'
- contains a caracter between [a-f] afterwards : 'a'
- 'i' fulfills the '.' component
- does not end with either the caracter 't' or 'a'
Thanks.
Best Answer
The word
emmailloter
contains much more thani
between the bits matched by[a-f]
and[^ta]$
. The.
pattern only ever matches a single character, so if you want to match multiple characters betweenemma
andr
at the end, you will have to allow for multiple characters:With
grep -E
(enabling extended regular expressions),..*
could be written.+
, i.e. "match at least one character". The expression..*
reads as "match a character, and then possibly more characters". In the same way,emm*
could be replaced byem+
, i.e. "e
followed by at least onem
" if usinggrep -E
.This would match the string
(the matching part indicated by the
^
characters above), for example, and alsoemmailloter
:Testing:
Note that for the word
remembré
, the match will benot
One way to visualise the matches using
sed
:This will only print matching lines, with each matched part of the regular expression in parentheses. This also assumes that you are using a
sed
implementation that can be used to match French characters and that the locale environment variables are properly set up for doing that.Compare this with what your original expression produces: