Invalid back reference using grep

command linegrepregular expression

So I am trying to find 6-letter words that consist of one character repeated three times followed by another character repeated three times. For example aaabbb or oookkk.

I am trying:

grep -E "[a-z]\1{3}\S[a-z]\1{3}" filename

First, is the regex correct? Second why am I getting grep: Invalid back reference?

Best Answer

No, it's not correct. I have no idea what the \1{3} is supposed to be but that's what is causing you problems. If you want to find lines that contain three repeated characters followed by three other repeated characters, you can use this:

grep -E '([a-z])\1{2}([a-z])\2{2}'

The \1 refers to the first captured group. You can capture groups by using parentheses. Then, \1 is the 1st such group and \2 is the second and so on. Since you had no captured groups, grep was complaining about an invalid reference since it had nothing to refer to. So, in the regex above, the parentheses are capturing the two groups. Then, you want {2} and not {3} since the initial match is also counted.

You don't specify whether you need the match to be a word or whether you also want to match within words. If you want the entire word to match (and exclude things like aaaabbb, use this instead:

grep -wE '([a-z])\1{2}([a-z])\2{2}'

To print only the matched portion of the line (the word) and not the entire line, use (GNU grep only):

grep -owE '([a-z])\1{2}([a-z])\2{2}'
Related Question