The simplest method to count lines matching specific patterns, including ‘0’ if line is not found

grepsortuniqwc

I have very big logs (several gigabytes per day), that can (but do not need to) contain specific lines. I have to count the number of occurences of every one of these lines on a daily basis.

I have a file patterns.in, that contains the desired lines. For example:

aaaa
bbbb
cccc
dddd
eeee
ffff

The log files can look like this:

asd
dfg
aaaa
aaaa
sa
sdf
dddd
dddd
dddd
dddd
ghj
bbbb
cccc
cccc
cccc
fgg
fgh
hjk

The first (and perhaps most obvious approach) is to use grep, sort and uniq in the following way:

grep -f patterns.in logfile.txt | sort | uniq -c

which gives the following result:

   2 aaaa
   1 bbbb
   3 cccc
   4 dddd

It is close to what I want to achieve, but my desired result is:

   2 aaaa
   1 bbbb
   3 cccc
   4 dddd
   0 eeee
   0 ffff

So the problem is: how to print '0' if a line from pattern.in file is not matched? It needs to be done in a simplest possible way, as all I have available is the cygwin environment.

Best Answer

how about feeding the pattern file back in as a data file so that each pattern finds at least one match, and then subtracting one from the final reported count for each match

grep -f patterns.in logfile.txt patterns.in | cut -f2 -d':' | sort | uniq -c | awk '{print($1 - 1" "$2)}'

Related Solutions

Shell – Count lines matching pattern and matching previous line

If your grep is the GNU grep, here is a quick and dirty solution:

grep -A1 "Prepare to remove role" | grep "Delete Successful" | wc -l

The grep option -A1 tells grep to print the matching line AND one line following the matching line. The second grep then only prints the lines where the delete is successfull.

Note that this will only work reliably when the "Prepare to remove role X" line is always immediately followed by the "Delete Successful" line.

Also note: you don't need wc -l because grep has that functionality built in:

grep -A1 "Prepare to remove role" | grep -c "Delete Successful"

Print lines between (and including) two patterns

You are better off using awk or sed

awk '/CK$/,/D$/' file.txt

sed -n '/CK$/,/D$/p' file.txt

If you insist on grep, here's a GNU grep way

grep -oPz '(?s)(?<=\n)\N+CK\n.*?D(?=\n)' file.txt

Here

-P activates perl-regexp

-z sets line separator to NUL. This forces grep to see the entire file as one single line

-o prints only matching

(?s) activates PCRE_DOTALL, so . finds any character or newline

\N matches anything except newline

.*? finds . in nongreedy mode

(?<=..) is a look-behind assertion

(?=..) is a look-ahead assertion

Best Answer

Related Solutions

Shell – Count lines matching pattern and matching previous line

Print lines between (and including) two patterns

Related Question