I am looking for a regular expression that finds all occurences of double characters in a text, a listing, etc. on the command line (Bash).
Main Question: Is there a simple way to look for sequences like aa
, ll
, ttttt
, etc. where one defines a regular expression that looks for n occurences of the same character with? What I am looking for is achieving this on a very very basic level. On the command line. In a Linux Shell.
After quite some research I came to the following answers – and questions resulting from them, thus they just gave me a hint where the solution might be. But:
a) (e)grep and the backslash issue
grep 'a\{2\}'
looks foraa
egrep'a{2}'
looks foraa
Question: Is the necessity of setting backlashes really bound to the command I use? If so, can anyone give me hint what else is to be taken into account when using (e)grep here?
b) I found this answer here for my question, though it isn't exactly what I was looking for:
grep -E '(.)\1' filename
looks for entries with the same character appearing more than once but doesn't ask how often. This is close to what I am looking for, but I still want to set a number of repeatings.
I probably should split this into two or more questions, but then I don't want to flood this awesome site here.
P.S.: Another question, possibly off topic but: is it in
, inside
, at
or on the shell
. And is on the command line
correct?
Best Answer
This really is two questions, and should have been split up. But since the answers are relatively simple, I will put them here. These answers are for GNU
grep
specifically.a)
egrep
is the same asgrep -E
. Both indicate that "Extended Regular Expressions" should be used instead ofgrep
's default Regular Expressions.grep
requires the backslashes for plain Regular Expressions.From the
man
page:See the
man
page for additional details about historical conventions and portability.b) Use
egrep '(.)\1{N}'
and replaceN
with the number of characters you wish to replace minus one (since the dot matches the first one). So if you want to match a character repeated four times, useegrep '(.)\1{3}'
.