Analyse Queries with Pt-query-digest
I am using pt-query-digest to analyse queries and would like to analyse a specific time range only, in order to measure the effect of adding new indices and tuning the server.
The Query log is of course turned on (MySQL 5.6) and queries are written to the log files as expected. I transfer the log files to my local machine and concatenate all compressed log files into one file like this:
zcat -f $(ls -tr mysql-slow.log*) > MySQL-All-Slowlogs.log
When I then execute pt-query-digest without any parameters, I get the output as expected:
pt-query-digest MySQL-All-Slowlogs.log > output.log
The date range in this output file is the whole range that is covered within the log files and the header states this:
# 490ms user time, 10ms system time, 35.09M rss, 103.70M vsz
# Current date: Fri Apr 7 10:58:38 2017
# Hostname: LocalMachine
# Files: MySQL-All-Slowlogs.log
# Overall: 2.33k total, 125 unique, 0.00 QPS, 0.05x concurrency __________
# Time range: 2017-03-31 03:16:49 to 2017-04-07 06:00:10
# Attribute total min max avg 95% stddev median
# ============ ======= ======= ======= ======= ======= ======= =======
Thats fine. Now I would like to investigate on the queries from the last two days, for instance
Problem
Pt-query-digest does not respect date ranges I provide, I still get the full list starting end of march (2017-03-31)
# Gives all queries
pt-query-digest --since 2d MySQL-All-Slowlogs.log > output.log
# Also all queries
pt-query-digest --since '2017-04-05' MySQL-All-Slowlogs.log > output.log
And specifying the until date gives no results at all
pt-query-digest --since '2017-04-05' --until '2017-04-07' MySQL-All-Slowlogs.log > output.log
# No events processed.
I also provided a DSN with the same results while connecting via I an SSH tunnel
pt-query-digest --since '2017-04-05' h=127.0.0.1,P=$PORT,u=$USER,p=$PASSWORD MySQL-All-Slowlogs.log > output.log
Expected Result
The documentation says this:
--since
type: string
Parse only queries newer than this value (parse queries since this date).
This option allows you to ignore queries older than a certain value and parse only those queries which are more recent than the value. The value can be several types:
* Simple time value N with optional suffix: N[shmd], where
s=seconds, h=hours, m=minutes, d=days (default s if no suffix
given); this is like saying "since N[shmd] ago"
* Full date with optional hours:minutes:seconds:
YYYY-MM-DD [HH:MM:SS]
* Short, MySQL-style date:
YYMMDD [HH:MM:SS]
* Any time expression evaluated by MySQL:
CURRENT_DATE - INTERVAL 7 DAY
If you give a MySQL time expression, and you have not also specified a DSN for --explain, --processlist, or --review, then you must specify a DSN on the command line so that pt-query-digest can connect to MySQL to evaluate the expression.
The MySQL time expression is wrapped inside a query like “SELECT UNIX_TIMESTAMP(<expression>)”, so be sure that the expression is valid inside this query. For example, do not use UNIX_TIMESTAMP() because UNIX_TIMESTAMP(UNIX_TIMESTAMP()) returns 0.
Events are assumed to be in chronological: older events at the beginning of the log and newer events at the end of the log. --since is strict: it ignores all queries until one is found that is new enough. Therefore, if the query events are not consistently timestamped, some may be ignored which are actually new enough.
What am I missing here?
Thanks!
Best Answer
Preserve the correct sequence
I finally figured out the problem. The way how I extract all the gzipped log files and add them to a consolidated log file did not preserve the correct sequence and therefore pt-query-digest did not find the queries in the order it was expecting them.
The manual even says:
Solution
The solution is to process the logrotate files in reverse order in order to end up with a logfile that has the oldest logs first and the newest ones at the end of the file.
The command above does the following things:
Then you can use the following comand to obtain