So I have a bunch of Apache logs with using the standard log format. I want to get all the log lines that did not come from a web crawler.
So lets say I have a file robot_patterns with entries like
Googlebot
msnbot-media
YandexBot
bingbot
If I run the command grep -f robot_patterns *.log
I will get all the entries by bots matching the above patterns. My actual list has ~30 entries of bots and agents that I wish to ignore.
But I want to find all the entries that are NOT from bots. So I try grep -v -f robot_patterns *.log
and no results are returned by grep. This is not what I expect or desire, and I am not finding an obvious way to get what I want. When using the -v
option combined with multiple patterns in a file, grep will only return a matching line if it matches EVERY pattern.
Best Answer
If there is an empty line in the patterns file it will match every line, causing no lines to be returned with
-v
. This is because the lines are interpreted as regular expressions, and an empty regular expression will always match.This isn't a problem with
-F
however, becausegrep
ignores empty lines with-F
.-F
causesgrep
to interpret the lines as simple strings to search for and may speed upgrep
if regular expressions aren't needed.