Extract lines that have two or more dots

grepregular expressionsearchtext processing

I need to extract (or count) the lines (in a file)
that have two or more dots. The lines should not start with dot
(it’s OK if they end with a dot), and there must not be two dots in a row
(i.e., the dots are all separated with non-dot characters).

Output Example:

a.b.
a.b.com
a.b.c.
a.b.c.com

But not:

a.com
a..b
a.b.c..d

I did this command:

grep -P '^[^.]+\.([^.]+\.)+[.]+' file.txt | wc -l

but it didn't find any matching lines.
How should I do this?

Best Answer

\. and [.] are equivalent — they both match a literal dot, and not any other character. As a matter of style, pick one and use it consistently.
Your problem is that your regular expression (i.e., pattern) has ([^.]+\.)+ followed by [.]+. That’s really (sort of) equivalent to [^.]+\. followed by [.], with the result that your grep is looking for lines that contain text.text.., i.e., two dots in a row. If you check, you’ll see that your command matches a.b...
OK, I believe that the fix is fairly simple:
```
grep -P '^[^.]+\.([^.]+\.)+[^.]*$'
```
I.e., change the [.] to [^.] (perhaps that’s what you meant originally?), change the following + to an *, and add a $. After some number of text. groups, require/allow any number (zero or more) characters other than dot, up to the end of the line.
An even simpler approach (easier to understand) would be
```
grep -P '^[^.]+\..*\.' file.txt | grep -v '\.\.'
```
The first grep finds lines that begin with a non-dot character and include at least two dots. The second grep removes lines that have two consecutive dots.
Rather than do grep … | wc -l, just do grep -c ….

Related Solutions

Multiline grepping – What’s wrong with this expression

You can use this:

grep -Pzo '(?s)Reference.*?\.' tst.txt

where tst.txt is your input file. It is the same regex as yours, but with two new flags.

I added the -z flag to suppress newline at the end of line, substituting it for null character. Thus grep knows where end of line is, but sees the input as one big line.

The -o flag means that it only prints the matched part.

I got the following output:

Reference duiarneutdigane uditraenturida enudtiar.
Reference uiae uiaetrtdnsu iatdne uiatrdenu diaren uidtae
on line 23.
Reference uriadne udtiraeb unledut iaeru uilaedr
uiarnde line 234.

Grepping string, but include all non-blank lines following each grep match

Using awk rather than grep:

awk '/FOO/ { if (matching) printf("\n"); matching = 1 }
     /^$/  { if (matching) printf("\n"); matching = 0 }
     matching' file

A version that enumerates the matches:

awk 'function flush_print_maybe() {
         if (matching) printf("Match %d\n%s\n\n", ++n, buf)
         buf = ""
     }
     /FOO/ { flush_print_maybe(); matching = 1 }
     /^$/  { flush_print_maybe(); matching = 0 }
     matching { buf = (buf == "" ? $0 : buf ORS $0) }
     END   { flush_print_maybe() }' file

Both awk programs uses a very simple "state machine" to determine if it's currently matching or not matching. A match of the pattern FOO will cause it to enter the matching state, and a match of the pattern ^$ (an empty line) will cause it to enter the non-matching state.

Output of empty lines between matching sets of data happens at state transitions from matching (either into matching or into non-matching).

The first program prints any line when in the matching state.

The second program collects lines in a buf variable when in a matching state. It flushes (empties) this after possibly printing it (depending on the state), together with a Match N label at state transitions (when the first program would output an empty line).

Output of this last program on the sample data:

Match 1
this line contains FOO
this line is not blank

Match 2
This line also contains FOO

Match 3
This line contains FOO too
Not blank
Also not blank

Match 4
FOO!
Yet more random text

Match 5
FOO!

Best Answer

Related Solutions

Multiline grepping – What’s wrong with this expression

Grepping string, but include all non-blank lines following each grep match

Related Question