Regular Expression for Finding Double Characters in Bash

bashcommand linegreplinuxregular expression

I am looking for a regular expression that finds all occurences of double characters in a text, a listing, etc. on the command line (Bash).

Main Question: Is there a simple way to look for sequences like aa, ll, ttttt, etc. where one defines a regular expression that looks for n occurences of the same character with? What I am looking for is achieving this on a very very basic level. On the command line. In a Linux Shell.

After quite some research I came to the following answers – and questions resulting from them, thus they just gave me a hint where the solution might be. But:

a) (e)grep and the backslash issue

  • grep 'a\{2\}' looks for aa
  • egrep'a{2}' looks for aa

Question: Is the necessity of setting backlashes really bound to the command I use? If so, can anyone give me hint what else is to be taken into account when using (e)grep here?

b) I found this answer here for my question, though it isn't exactly what I was looking for:

grep -E '(.)\1' filename looks for entries with the same character appearing more than once but doesn't ask how often. This is close to what I am looking for, but I still want to set a number of repeatings.

I probably should split this into two or more questions, but then I don't want to flood this awesome site here.

P.S.: Another question, possibly off topic but: is it in, inside, at or on the shell. And is on the command line correct?

Best Answer

This really is two questions, and should have been split up. But since the answers are relatively simple, I will put them here. These answers are for GNU grep specifically.

a) egrep is the same as grep -E. Both indicate that "Extended Regular Expressions" should be used instead of grep's default Regular Expressions. grep requires the backslashes for plain Regular Expressions.

From the man page:

Basic vs Extended Regular Expressions

In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).

See the man page for additional details about historical conventions and portability.

b) Use egrep '(.)\1{N}' and replace N with the number of characters you wish to replace minus one (since the dot matches the first one). So if you want to match a character repeated four times, use egrep '(.)\1{3}'.

Related Question