MySQL performance for aggregate function

mysql-5.5performancequery-performance

I'm using mysql 5.5

Here is my table:

CREATE TABLE `temperature_information` (
   `id` int(11) NOT NULL AUTO_INCREMENT,
   `device` int(11) NOT NULL,
   `temperature` int(11) NOT NULL,
   `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
   PRIMARY KEY (`id`),
   KEY `device` (`device`),
   KEY `date` (`date`),
   KEY `idx` (`device`, `date`)
) ENGINE=InnoDB AUTO_INCREMENT=25602738 DEFAULT CHARSET=latin1

This table has ~50m. records.

Here is my query:

SELECT date as ValueDate, MAX(temperature)
FROM (
     SELECT date, temperature FROM temperature_information WHERE device = 1111 
     ORDER BY temperature DESC) c 
GROUP BY DATE(ValueDate),HOUR(ValueDate) 

This query returns Maximum temperature for each day. It's execution time is ~0.9 s. and subquery's time 0.003 sec.

I have separate indexes on the date and device columns and a multi-column index idx on device,date. The EXPLAIN says it uses device index for query, which is good. The sub query is very fast.

But to get MAX temperature for each day I need to use GROUP BY. I know that functions on indexed column disables the use of indexes, but I don't know a workaround to make it efficient and produce same results.

Here is EXPLAIN:

id  select_type     table           type    possible_keys   key     key_len  ref    rows    Extra   
1   SIMPLE  temperature_information ref     device,idx    device      4      const  29330   Using where; Using temporary; Using filesort

My question:

Is it possible to write a query which would be more efficient and would produce same results or should I process rows returned by sub query and find MAX temperature for each day myself (This would be written in c)?

Sub query returns 20-40k rows on average.

PS. I know you can remove sub query, but I left it like this to clarify question


Creating index on those (3) columns improved performance ~20 times. But I have another concern regarding indexes. I provided, not full table. table have another columns like signal, etc. I'm planning to do same queries to find MAX for each day on these columns too. Does adding too many indexes make SELECT slower for other queries. I know that inserts and updates will be slower.

Best Answer

  • The subquery is not needed at all.
  • The ORDER BY inside a subquery like this (without a LIMIT) makes no sense.
  • The external query has SELECT date but GROUP BY DATE(date). HOUR(date). This, while allowed in MySQL older versions (i.e. before 5.7), is not valid SQL.

I suggest you rewrite:

SELECT 
    DATE(date) AS value_date, 
    HOUR(date) AS value_hour, 
    MAX(temperature) AS max_temperature
FROM temperature_information 
WHERE device = 1111 
GROUP BY DATE(date), HOUR(date) ;

Regarding performance:

  • an index on (device, date, temperature) will make the query more efficient than the current indexes on (device) alone and on (device, date). If you add this 3-column index, you could drop the other two ("device" and "idx") indexes.
  • Another option would be to store the date and hour part in separate columns and add a 4-column index on (device, date_part, hour_part, temperature).
  • If you move to version 5.7, you coul dhave the date and hour parts as generated columns. See MySQL docs: Ganerated Columns.
  • After the comment/edit that there are many more data columns and since it looks like you'll be running analytic type of queries, another option would be to change the PRIMARY KEY to (device, date) - or (device, date_part, hour_part, min_sec_part). This effectively - for an InnoDB table - clusters the data in the way you want them for this query. Of course you should test first this alternative design and how it will affect other queries, too.