I have a file, let's call it filename.log, in it I have something like this
(2014-11-18 14:09:21,766), , xxxxxx.local, EventSystem, DEBUG FtpsFile delay secs is 5 [pool-3-thread-7]
(2014-11-18 14:09:21,781), , xxxxxx.local, EventSystem, DEBUG FtpsFile disconnected from ftp server [pool-3-thread-7]
(2014-11-18 14:09:21,798), , xxxxxx.local, EventSystem, DEBUG FtpsFile FTP File Process@serverStatus on exit - 113 [pool-3-thread-7]
(2014-11-18 14:09:21,798), , xxxxxx.local, EventSystem, DEBUG FtpsFile FTP File Process@serverStatus on exit - 114 [pool-3-thread-7]
(2014-11-18 14:09:21,799), , xxxxxx.local, EventSystem, DEBUG JobQueue $_Runnable Finally of consume() :: [pool-3-thread-7]
I am trying to find the classes the produce the most frequent DEBUG messages.
In this example you can see FtpsFile and JobQueue are two of the classes producing a message.
I have this
cat filename.log | sed -n -e 's/^.*\(DEBUG \)/\1/p' | sort | uniq -c | sort -rn | head -10
This will produce the class name and show me the most frequent classes as a top 10.
The problem is this does not give me the count of the class FtpsFile as 4. It counts each FtpsFile log file as a different unique entity.
How do I change the command above to basically say grab the first word after DEBUG and ignore the rest for your count?
Ideally I should get
4 FtpsFile
1 JobQueue
Best Answer
With GNU
sed
:With
grep
:With
awk
:The last one can be done in pure
awk
, but for a sake of similarity I piped it touniq
.