Extract lines that have two or more dots

grepregular expressionsearchtext processing

I need to extract (or count) the lines (in a file)
that have two or more dots. The lines should not start with dot
(it’s OK if they end with a dot), and there must not be two dots in a row
(i.e., the dots are all separated with non-dot characters).

Output Example:

a.b.
a.b.com
a.b.c.
a.b.c.com

But not:

a.com
a..b
a.b.c..d

I did this command:

grep -P '^[^.]+\.([^.]+\.)+[.]+' file.txt | wc -l

but it didn't find any matching lines. 
How should I do this?

Best Answer

  1. \. and [.] are equivalent — they both match a literal dot, and not any other character.  As a matter of style, pick one and use it consistently.
  2. Your problem is that your regular expression (i.e., pattern) has ([^.]+\.)+ followed by [.]+.  That’s really (sort of) equivalent to [^.]+\. followed by [.], with the result that your grep is looking for lines that contain text.text.., i.e., two dots in a row.  If you check, you’ll see that your command matches a.b...
  3. OK, I believe that the fix is fairly simple:
    grep -P '^[^.]+\.([^.]+\.)+[^.]*$'
    I.e., change the [.] to [^.] (perhaps that’s what you meant originally?), change the following + to an *, and add a $.  After some number of text. groups, require/allow any number (zero or more) characters other than dot, up to the end of the line.
  4. An even simpler approach (easier to understand) would be
    grep -P '^[^.]+\..*\.' file.txt | grep -v '\.\.'
    The first grep finds lines that begin with a non-dot character and include at least two dots.  The second grep removes lines that have two consecutive dots.
  5. Rather than do grep … | wc -l, just do grep -c ….
Related Question