I have a table (cumulative_energy_consumptions
) defined as:
id int(11) pk
sensor_id int(11) fk
value decimal
recorded_at datetime
with two indexes, one on sensor_id, the other on (sensor_id, recorded_at) and it uses InnoDB engine.
Given a sensor and a datetime, I have to find the row with the maximum 'recorded_at' before that datetime.
It can be solved with the following query:
SELECT MAX(recorded_at)
FROM cumulative_energy_consumptions
WHERE sensor_id = 88 AND recorded_at <= '2016-06-29 00:00:00'
and it is incredibly fast (0.00030 s on my machine), even with a very chatty sensor (id 88) that has more than 300k records.
Instead of a single sensor, I wrote the following query to get the result for many sensors at once:
SELECT sensor_id, MAX(recorded_at) AS first_before
FROM cumulative_energy_consumptions
WHERE
sensor_id IN (80, 85, 88, 89)
AND
recorded_at <= '2016-07-10 00:00:00'
GROUP BY sensor_id
and it is also very very fast (0.00055 s)
Now I was trying to write the query for one sensor and many datetimes, and I tried that:
SELECT input_times.*, (
SELECT MAX(recorded_at)
FROM cumulative_energy_consumptions
WHERE sensor_id = 88 AND recorded_at <= input_times.instant
) as res
FROM
(
SELECT '2016-06-29 00:00:00' as instant
-- UNION SELECT '2016-06-30 00:00:00' as instant
-- UNION SELECT '2016-07-01 00:00:00' as instant
-- UNION SELECT '2016-07-02 00:00:00' as instant
) as input_times
when the table created using unions has only one row, it takes 0.0004s and I was expecting that with two rows it would have taken roughly the double, but it completes in 0.2 s; with 4 rows it takes 0.4s and so on.
Why does MySql uses a different strategy with multiple rows instead of applying the strategy for one row many times? Is there a way to rewrite this query to make it as fast as expected?
Thank you
Best Answer
Try this rewriting. It avoids the
GROUP BY
and should be using the index on(sensor_id, recorded_at)
for both the subquery and the join: