MySql query slows down

MySQLperformancequery-performance

I have a table (cumulative_energy_consumptions) defined as:

id int(11) pk
sensor_id int(11) fk
value decimal
recorded_at datetime

with two indexes, one on sensor_id, the other on (sensor_id, recorded_at) and it uses InnoDB engine.

Given a sensor and a datetime, I have to find the row with the maximum 'recorded_at' before that datetime.
It can be solved with the following query:

    SELECT MAX(recorded_at)
    FROM cumulative_energy_consumptions
    WHERE sensor_id = 88 AND recorded_at <= '2016-06-29 00:00:00' 

and it is incredibly fast (0.00030 s on my machine), even with a very chatty sensor (id 88) that has more than 300k records.

Instead of a single sensor, I wrote the following query to get the result for many sensors at once:

    SELECT sensor_id, MAX(recorded_at) AS first_before
            FROM cumulative_energy_consumptions
            WHERE 
                sensor_id IN (80, 85, 88, 89) 
                AND 
                recorded_at <= '2016-07-10 00:00:00' 
            GROUP BY sensor_id

and it is also very very fast (0.00055 s)

Now I was trying to write the query for one sensor and many datetimes, and I tried that:

SELECT input_times.*, (
        SELECT MAX(recorded_at)
        FROM cumulative_energy_consumptions
        WHERE sensor_id = 88 AND recorded_at <= input_times.instant
    ) as res
FROM
    (
        SELECT '2016-06-29 00:00:00' as instant
--      UNION SELECT '2016-06-30 00:00:00' as instant
--    UNION SELECT '2016-07-01 00:00:00' as instant
--    UNION SELECT '2016-07-02 00:00:00' as instant
   ) as input_times

when the table created using unions has only one row, it takes 0.0004s and I was expecting that with two rows it would have taken roughly the double, but it completes in 0.2 s; with 4 rows it takes 0.4s and so on.

Why does MySql uses a different strategy with multiple rows instead of applying the strategy for one row many times? Is there a way to rewrite this query to make it as fast as expected?

Thank you

Best Answer

Try this rewriting. It avoids the GROUP BY and should be using the index on (sensor_id, recorded_at) for both the subquery and the join:

SELECT 
    input_times.instant, 
    c.sensor_id, 
    c.recorded_at AS first_before,
    c.value 
FROM 
    ( SELECT 80 AS sensor_id UNION ALL
      SELECT 85  UNION ALL
      SELECT 88  UNION ALL
      SELECT 89
    ) AS s
  CROSS JOIN
    ( SELECT '2016-06-29 00:00:00' AS instant 
      UNION ALL 
      ...
    ) AS input_times
  JOIN  
    cumulative_energy_consumptions AS c
  ON  c.sensor_id = s.sensor_id
  AND c.recorded_at = ( SELECT ci.recorded_at 
                        FROM cumulative_energy_consumptions AS ci
                        WHERE ci.sensor_id = s.sensor_id
                          AND ci.recorded_at <= input_times.instant 
                        ORDER BY ci.recorded_at DESC
                        LIMIT 1
                      ) ;