MySQL performance for aggregate function

mysql-5.5performancequery-performance

I'm using mysql 5.5

Here is my table:

CREATE TABLE `temperature_information` (
   `id` int(11) NOT NULL AUTO_INCREMENT,
   `device` int(11) NOT NULL,
   `temperature` int(11) NOT NULL,
   `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
   PRIMARY KEY (`id`),
   KEY `device` (`device`),
   KEY `date` (`date`),
   KEY `idx` (`device`, `date`)
) ENGINE=InnoDB AUTO_INCREMENT=25602738 DEFAULT CHARSET=latin1

This table has ~50m. records.

Here is my query:

SELECT date as ValueDate, MAX(temperature)
FROM (
     SELECT date, temperature FROM temperature_information WHERE device = 1111 
     ORDER BY temperature DESC) c 
GROUP BY DATE(ValueDate),HOUR(ValueDate)

This query returns Maximum temperature for each day. It's execution time is ~0.9 s. and subquery's time 0.003 sec.

I have separate indexes on the date and device columns and a multi-column index idx on device,date. The EXPLAIN says it uses device index for query, which is good. The sub query is very fast.

But to get MAX temperature for each day I need to use GROUP BY. I know that functions on indexed column disables the use of indexes, but I don't know a workaround to make it efficient and produce same results.

Here is EXPLAIN:

id  select_type     table           type    possible_keys   key     key_len  ref    rows    Extra   
1   SIMPLE  temperature_information ref     device,idx    device      4      const  29330   Using where; Using temporary; Using filesort

My question:

Is it possible to write a query which would be more efficient and would produce same results or should I process rows returned by sub query and find MAX temperature for each day myself (This would be written in c)?

Sub query returns 20-40k rows on average.

PS. I know you can remove sub query, but I left it like this to clarify question

Creating index on those (3) columns improved performance ~20 times. But I have another concern regarding indexes. I provided, not full table. table have another columns like signal, etc. I'm planning to do same queries to find MAX for each day on these columns too. Does adding too many indexes make SELECT slower for other queries. I know that inserts and updates will be slower.

Best Answer

The subquery is not needed at all.
The ORDER BY inside a subquery like this (without a LIMIT) makes no sense.
The external query has SELECT date but GROUP BY DATE(date). HOUR(date). This, while allowed in MySQL older versions (i.e. before 5.7), is not valid SQL.

I suggest you rewrite:

SELECT 
    DATE(date) AS value_date, 
    HOUR(date) AS value_hour, 
    MAX(temperature) AS max_temperature
FROM temperature_information 
WHERE device = 1111 
GROUP BY DATE(date), HOUR(date) ;

Regarding performance:

an index on (device, date, temperature) will make the query more efficient than the current indexes on (device) alone and on (device, date). If you add this 3-column index, you could drop the other two ("device" and "idx") indexes.
Another option would be to store the date and hour part in separate columns and add a 4-column index on (device, date_part, hour_part, temperature).
If you move to version 5.7, you coul dhave the date and hour parts as generated columns. See MySQL docs: Ganerated Columns.
After the comment/edit that there are many more data columns and since it looks like you'll be running analytic type of queries, another option would be to change the PRIMARY KEY to (device, date) - or (device, date_part, hour_part, min_sec_part). This effectively - for an InnoDB table - clusters the data in the way you want them for this query. Of course you should test first this alternative design and how it will affect other queries, too.

Related Solutions

MySQL optimization – year column grouping – using temporary table, filesort

I don't see a lot of opportunity for improvement.

The index you added was probably a big help, because it's being used for the range matching on the WHERE clause (type => range, key => tran_date), and it's being used as a covering index (extra => using index), avoiding the need to seek into the table to fetch the row data.

But since you're using functions to construct the financial_year value for the group by, both the "using filesort" and "using temporary" can't be avoided. But, those aren't the real problem. The real problem is that you're evaluating MONTH(tran_date) 346,485 times and YEAR(tran_date) at least that many times... ~700,000 function calls in one second doesn't seem too bad.

Plan B: I am definitely not a fan of storing redundant data, and I'm dead-set against making the application responsible for maintaining it... but one option I might be tempted to try would be to create a dashboard_stats_by_financial_year table, and use after-insert/update/delete triggers on the transactions1 table to manage keeping those stats current.

That option has a cost, of course -- adding to the amount of time it takes to update/insert/delete a transaction... but, waiting > 1200 milliseconds for stats for your dashboard is a cost, too. So it may come down to whether you want to pay for it now or pay for it later.

Mysql – How to optimize indexes on MySQL query with various sorts

WHERE playable_character = 0 AND
    date_published BETWEEN date_sub(now(), INTERVAL 3 YEAR) AND now()

Start with the "=" item, then do the range:

INDEX(playable_character, date_published);

"Pagination", a la ORDER BY rating DESC LIMIT 4000, 1000; is best done by remember where you "left off". That way, you don't have scan over the 4000 records that you don't need.

Best Answer

Related Solutions

MySQL optimization – year column grouping – using temporary table, filesort

Mysql – How to optimize indexes on MySQL query with various sorts

Related Question