Ubuntu – How to count number of partial occurrences of a string in a file

bashcommand line

I have a file of which I need to count all partial matches for an input string in a file.
I'll show you an easy example of what I need:

In a file with this content:

Good-Black-Cat
Bad-Red-Cat
Bad-Gray-Dog
Good-Golden-Dog
Bad-White-Dog
Good-Tabby-Cat
Bad-Siamese-Cat

I need to count how many times does the partial string "Good -*-Cat" (Where * could be anything, it doesn't matter) appears. The expected output count is 2.

Any help will be appreciated.

Best Answer

Given

$ cat file
Good-Black-Cat
Bad-Red-Cat
Bad-Gray-Dog
Good-Golden-Dog
Bad-White-Dog
Good-Tabby-Cat
Bad-Siamese-Cat

then

$ grep -c 'Good-.*-Cat' file
2

Note that this is a count of matching lines - so for example it won't work for multiple occurrences per line, or for occurrences that span lines.

Alternatively, with awk

awk '/Good-.*-Cat/ {n++} END {print n}' file

If you need to match multiple possible occurrences per line, then I'd suggest perl:

perl -lne '$c += () = /Good-.*?-Cat/g }{ print $c' file

where /Good-.*?-Cat/g matches multiple times (g) and non-greedily* (.*?) and the () = assignment forces the matches to be evaluated in a scalar context so we can add them to the count.

Alternatively, you could use grep in perl-comparible regular expression (PCRE) mode (so as to enable the non-greedy modifier), with -o to output only the matching portions - then count those with wc:

grep -Po 'Good-.*?-Cat' file | wc -l

If you also need to match occurrences that may span a line boundary, then you can do so in perl by unsetting the record separator (note: this means that that the whole file is slurped into memory) and adding the s regex modifier e.g.

perl -0777 -nE '$c += () = /Good-.*?-Cat/gs }{ say $c' file