The query optimizer is free to rearrange the join-order of tables in a query to any logically-consistent sequence based on its estimates of the costs of the query... unless you use STRAIGHT_JOIN
, which forces the optimizer to read the left table before the right table in that particular join. (In MySQL, you can also SELECT STRAIGHT_JOIN ...
which forces all the tables to be handled in the order specified in the FROM
clause).
The reason for doing this is for force the optimizer to choose a plan that you know to be better than the one it's choosing on its own. In your case, sometimes that's a better plan, and sometimes it isn't.
You only posted one EXPLAIN
, but I strongly suspect you'll find the EXPLAIN
to be different for the query without the STRAIGHT_JOIN
, which will probably make the performance discrepancy more readily apparent. It's almost inconceivable that the plan is the same, since the performance is so different.
There's another problem with the design of your query, which might be contributing to the poor performance when the query plan changes:
WHERE ...
DATE(`Mention`.`indexed`) BETWEEN "2012-11-04" AND "2012-12-04"
This is syntactically valid, but bad practice, because you're telling the server "for each row we haven't eliminated with other attributes in the WHERE clause or joins, evaluate Mention
.indexed
using the DATE()
function and eliminate the rows where the resulting answer is not between "2012-11-04" AND "2012-12-04".
Change to this:
WHERE ...
`Mention`.`indexed` BETWEEN '2012-11-04'
AND DATE_SUB(DATE_ADD('2012-12-04',INTERVAL 1 DAY),INTERVAL 1 SECOND)
The optimizer will evaluate the two expressions only once, and the second expression evaluates to '2012-12-04 23:59:59'. So now you have two constants, which can be used to match rows with the index on Mention
.indexed
using a range scan if the optimizer thinks that's a good idea. As your query is written, that index can't be used for filtering rows.
"But wait," someone says, "the EXPLAIN
says it's using that index." Yes, it's using it to sort the results, but it's not using it for eliminating non-matching rows, because putting a formula on the left side of the where clause almost always eliminates the possibility of an index being used on the columns being passed as arguments into the function.
When you see Using where
in the Extra
column, that is the optimizer saying "With the query plan I've selected, I'm going to have to ask the underlying storage engine for more rows from this table than we actually want, and filter them at the MySQL layer using something from the WHERE
clause to find what we actually need."
This should work if you want to join two tables by day and do counts on different conditions of the columns.
SELECT *
FROM (
SELECT
SUM(CASE WHEN crit1 = "AAA" THEN 1 ELSE 0 END) as TheAs,
SUM(CASE WHEN crit1 = "BBB" THEN 1 ELSE 0 END) as TheBs,
SUM(CASE WHEN crit3 = "CCC" THEN 1 ELSE 0 END) as TheCs,
date(time) someDay
FROM recent_items
GROUP BY date(time)
) as recitems
JOIN (
SELECT
SUM(CASE WHEN crit1 = "AAA" THEN 1 ELSE 0 END) as TheAs,
SUM(CASE WHEN crit1 = "BBB" THEN 1 ELSE 0 END) as TheBs,
SUM(CASE WHEN crit3 = "CCC" THEN 1 ELSE 0 END) as TheCs,
date(time) someDay
FROM insider_trades
GROUP BY [timestamp]
) as InInfo ON recitems.someDay = InInfo.someDay
This may be slow depending on the size of the data that is being queried. Limiting the number of days you are trying to get data for by adding where clauses to the inner queries will help.
Best Answer
You can use
ADDTIME()
function:This might use an index on
tableA (datetime_column)
but not an index ontableB
. The reverse might use an index ontableB (date_column, time_column)
but not on A:It won't hurt testing both versions. If one table is much larger than the other, then prefer to have the larger table's columns exposed (not cast) so their index might be used.
If you move to MariaDB (any version 5.3+) or MySQL 5.7 (when it's released), you can define a
VIRTUAL
column (or two) in one of the two tables to hold this conversion/calculation that can be persisted and indexed.In 5.5, if efficiency is not good, which is expected with large tables, you could add a computed column yourself but it would have to be populated during inserts and kept in sync during updates by you (e.g. using triggers).