I have a log file that looks something like the following:
query1 startQuery
query1 do something
query1 do something else
query2 startQuery
query1 do something banned
query2 do something
query3 startQuery
query2 endQuery 1000
query3 something else to do
query1 endQuery 2003
query3 do something
query4 startQuery
query4 endQuery 100
query3 endQuery 1434
I am finding the longest running queries:
> grep "endQuery" logfile | awk '{print $3 " " $1}' | sort -nr | head -n 3
2003 query1
1434 query3
1000 query2
However, there are certain operations known to be long, and I want to find the longest running queries that do not include these operations. For example, I want to find the longest running queries that do not, in any of their log lines, include the word "banned".
In this example it would output:
1434 query3
1000 query2
100 query4
In reality these log files are large and contain a lot of queries.
Best Answer
First, note that you don't need the call to
grep
, by the way: it can be seamlessly integrated into theawk
call.You can filter out the banned queries at the awk stage. Store ongoing queries in an array, remove them if they're banned, and only print out the non-banned ones.