I am trying to do a simple per month of year sum of values on a big (>10M) table.
However, it faces significant performance issues (7 sec with 4G innodb pool size, in local db with 1G innodb pool size it takes over a minute). The script seems fairly simple.
SELECT
year(date_time),
month(date_time),
sum(value)
FROM measurements
WHERE
source_id = 2
GROUP BY year(date_time), month(date_time)
I have tried using MySQL 5.7 and MariaDB 10.2. Using EXPLAIN in both cases provides:
SIMPLE measurements ref source_id,source_date source_id 4 const 4418476 Using where; Using temporary; Using filesort
It seems I cannot avoid using temporary
table. The table has 2 indexes: date_time
and source_id, date_time
.
Best Answer
INDEX(source_id, date_time, value)
is optimal, partially because it is "covering". "Covering" means that the query can be completely handled by the columns in the index. I am assuming you did not water down the query?=
must come first, regardless of cardinality. Else, Ifdate_time
is first, it will have to read the entire index.EXPLAIN
estimates thatsource_id = 2
44% (4418476/10M) of the time. Reading 4M rows is a lot better than 10M.SHOW CREATE TABLE
; without it, I am making guesses in my Answer.source_id, date (no time), and sum(value)
.)