MySQL Performance – DATE vs. time_field in WHERE Clause

MySQLperformancequery-performance

I have 3 tables:

users
movements
unique_ids

I want to get the records gotten from march until now of the users registered in march, counting the unique id of their movements with a distinct and the total of it at the end:

SELECT MONTH(mo.ctime) AS month, COUNT(DISTINCT(unique_id))
FROM users us, movements mo,unique_ids id
WHERE us.time_field > '2014-03-01' AND us.time_field < '2014-03-31 23:59:59'
AND mo.time_field > '2014-03-01'
AND us.room_name LIKE 'name'
AND us.room_name=mo.room_name AND us.user=mo.user AND id.id_log=mo.id_log
GROUP BY month WITH ROLLUP

The thing is that few month ago in other query I realise that adding DATE() to the time_field into WHERE clause makes faster the query, but adding this into the query shown, gives me different results than the initial one.

What's the difference between time_field > 'x' AND DATE(time_field) > 'x' inside WHERE clause?

Best Answer

Just addressing this question

What's the difference between time_field > 'x' AND DATE(time_field) > 'x' inside WHERE clause?

The difference is how the Query Optimizer treats them.

  • If the WHERE clause has time_field > 'x', this signals the Query Optimizer to try looking for any index so as to take advantage of it doing a range scan.
  • If the WHERE clause has DATE(time_field) > 'x', this signals the Query Optimizer to throw all indexes under the bus because the DATE function has to be called across the whole table (in the form of a full table scan) or the join result.

Depending on the key distribution of time_field, DATE(time_field) > 'x' triggering a full table scan just happens to be better than a range scan on an index if the values make up a significant percentage of the index. This is even more true using InnoDB because a full table scan passes through the clustered index (where PRIMARY KEY and row data coexist), while a secondary index on time_field would cause a lookup of the secondary index in addition to the clustered index. If such a secondary index had a lopsided key distribution, such an index would be ignored in favor of a full table scan anyway.

This conjuecture is only indicative of your current dataset. Someone else's dataset may have a better (evenly distributed, more balanced) key distribution, resulting in time_field > 'x' working better than DATE(time_field) > 'x'.

To see such differences, run the EXPLAIN on both SELECT queries. The results may be different.