Don't use --short-form
. You are suppressing much of what you want to see.
--short-form
Display only the statements contained in the log, without any extra information or row-based events. This is for testing only, and should not be used in production systems.
— http://dev.mysql.com/doc/refman/5.6/en/mysqlbinlog.html#option_mysqlbinlog_short-form
That is a largely pointless option, since it returns only the statements contained in the log. If you have a binlog that large, it's unlikely that it's full of statements -- it's full of row events.
--base64-output=decode-rows
--verbose
are the options you are looking for.
http://dev.mysql.com/doc/refman/5.6/en/mysqlbinlog-row-events.html
You will, at the very least, not be disappointed by the quantity of data returned... it will be much larger than the actual binlogs themselves, which are stored in a relatively tightly-packed format.
Question: Does this mean the query is using indexes?
The Using intersect(FK_DataValues_Sites,FK_DataValues_Variables)
means that the query is using two indexes on table datavalues
with the Index Merge Intersection Access Algorithm and then (Using where
) the rows are additionally checked against some other conditions.
Question: Any other suggestions? (I guess about efficiency)
This is most probably not the most efficient way, especially if the query is retrieving a small perecentage of the table's rows.
For example, lets say that the SiteID = 15
condition narrows the search to 10% of the rows and the VariableID = 8
condition narrows the search to 5% of the rows. The two existing indexes can be used invidually or combined (with tthe help of the index merge intersection algorithm) to narrow the search to (0.10 * 0.05 =) 0.5% of the tables' rows (and that is what the MySQL engine is doing in this case.)
With a 280M rows table, this means that the first index can efficiently give us the 28M keys that match the 1st condition and the second index can give us the 14M rows that match the 2nd condition. Then, the index merge algorithm can further narrow it to the 1.4M keys (at the extra cost of this algorithm). Then, the 1.4M rows have to be retrieved and the 3rd condition (LocalDateTime >= '2009-04-18T00:02' AND LocalDateTime <= '2009-04-20T00:02
) - that is not covered by any index - has to be checked for every one of these 1.4M rows.
If this 3rd condition narrows the results to, say 5% or 70K rows, the overall efficiency of the query will not be the best possible.
The best would be if there was an index that covered all 3 conditions. Fortunately, there is a way. A multi-column index on either (VariableID, SiteID, LocalDateTime)
or on (SiteID, VariableID, LocalDateTime)
would be enough to narrow the search to only the needed keys (with the previous example to only 70K) and then only those rows would be retrieved from the table.
Extra question: Do I have to remove the current index and then replace it with a multi-column index?
No, I wouldn't remove any index, just add the new one, unless I was sure I don't need an existing index.
But analyzing queries (and indexing) should be done after examining all the queries against the database and the requirements (which queries need to be extra fast and optimized, which ones we can afford to wait longer for them, how other operations will be affected if we add a lot of indexes, etc.)
Best Answer
Typically this kind of decisions are taken with product managers. They are the ones who can say things like "customers should be able to read 2 years old information, but not necessarily older than that".
MySQL (or any other DBMS, as far as I know) doesn't provide a way to find out when a row was last read.
But there are ways to find out related information. This won't solve your problem, but hopefully you'll find useful indications.
CURRENT_TIMESTAMP
, but check: if there is no default value, I wouldn't consider the column reliable. You can't be sure that the application always updates the column.AFTER UPDATE
andAFTER DELETE
. Triggers can write the value of the primary key into a separate table, usingINSERT IGNORE
. Keep them going for some time, and then check the oldest row that was updated/deleted. Writing such triggers manually is impossible, but this can easily be automated.user_statistics
plugin. It tells you how many times each table and index is read/written. If you're lucky, you'll find out that some tables are never read and can be archived. Even if not, the number of read rows per hour could possibly give you a hint (or not... this is highly dependent on your database logic, so I can't give more advice on this).long_query_time
seconds. You can setlong_query_time=0
to log all the queries (this is a good practice for many reasons). You can thengrep
all queries that mention a certain table. You can make a script to run them all, and remember the oldest read row. Careful, you'll need to runSELECT
s but notINSERT
,UPDATE
, etc.