Mysql – Group by month taking significant time

group bymariadbMySQL

I am trying to do a simple per month of year sum of values on a big (>10M) table.
However, it faces significant performance issues (7 sec with 4G innodb pool size, in local db with 1G innodb pool size it takes over a minute). The script seems fairly simple.

SELECT
  year(date_time),
  month(date_time),
  sum(value)
FROM measurements
WHERE
  source_id = 2
GROUP BY year(date_time), month(date_time)

I have tried using MySQL 5.7 and MariaDB 10.2. Using EXPLAIN in both cases provides:

SIMPLE  measurements    ref source_id,source_date   source_id   4   const   4418476 Using where; Using temporary; Using filesort

It seems I cannot avoid using temporary table. The table has 2 indexes: date_time and source_id, date_time.

Best Answer

  • INDEX(source_id, date_time, value) is optimal, partially because it is "covering". "Covering" means that the query can be completely handled by the columns in the index. I am assuming you did not water down the query?
  • The column tested by = must come first, regardless of cardinality. Else, If date_time is first, it will have to read the entire index.
  • The EXPLAIN estimates that source_id = 2 44% (4418476/10M) of the time. Reading 4M rows is a lot better than 10M.
  • Please provide SHOW CREATE TABLE; without it, I am making guesses in my Answer.
  • Both of your attempted indexes have to bounce between the index BTree and the Data BTree.
  • Probably between 1GB's and 4GB's worth of blocks (16KB - data or index) needed to be pulled into cache (buffer_pool) to satisfy your query. Probably it was entirely in the 4GB buffer_pool when it took 7s. And in the 1GB, there was not room, so it was at least partially I/O-bound.
  • The suggested index will easily fit in 1GB buffer_pool. And only 44% of it is needed for this query.
  • When you add my composite query, get rid of the existing one that is a prefix of it; it will be redundant and unnecessary.
  • As your data grows, someday the index will become too big for 1GB. That's a fact of life.
  • However, by building and maintaining a [Summary table(http://mysql.rjweb.org/doc.php/summarytables), you can make the equivalent query take less than 1 second on either server, even as the table grows. (It would probably have 3 columns: source_id, date (no time), and sum(value).)