Mysql – SUM and Group By Optimization

group byMySQLoptimizationsum

I use this query:

 SELECT 
     logdate, 
     SUM(bytes)/1024/1024/1024 as traffic, 
     customer 
 FROM 
     `Logdata` 
 WHERE 
     customer is not null AND 
     logdate >= "2016-07-01" AND 
     logdate <= "2016-07-30" 
 GROUP By logdate, customer

on this table with currently around 6 mio rows (but there will be 10 times more rows):

CREATE TABLE `Logdata` (
  `id` bigint(20) UNSIGNED NOT NULL,
  `logdate` date DEFAULT NULL,
  `logtime` time DEFAULT NULL,
  `bytes` int(10) UNSIGNED DEFAULT NULL,
  `uri` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `customer` varchar(128) COLLATE utf8_unicode_ci DEFAULT NULL,
  `method` smallint(6) DEFAULT '200',
  `region` varchar(5) COLLATE utf8_unicode_ci DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

with this keys:

  ADD PRIMARY KEY (`id`),
  ADD KEY `IDX_logdate` (`logdate`),
  ADD KEY `IDX_method` (`method`),
  ADD KEY `IDX_customer` (`customer`),
  ADD KEY `IDX_customer_logdate` (`logdate`,`customer`);

and this execution plan:

id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
1 | SIMPLE | Logdata | NULL | index | IDX_logdate,IDX_customer,idx_customer_logdate | idx_customer_logdate | 391 | NULL | 6247535 | 25.00 | Using where

InnoDB Config is:

InnoDB buffer pool / data size: 2.0G/1.1G
InnoDB buffer pool instances: 2

With ~6mio rows in the table the query takes 14 seconds. Which will get worse as the amount of rows is increasing in a very fast pace.

The questions:

  • Would i benefit in this case from the myisam table engine?

  • What could i do to further optimize the query or the settings?

Best Answer

Would i benefit in this case from the myisam table engine?

No you won't.

What could i do to further optimize the query or the settings?

Materialize the query.

MySQL does not have built-in means to do that easily (similar to indexed views in SQL Server or materialized views in Oracle) so you'll have to put some effort into it.

Create a table like this:

CREATE TABLE log_customer_date
        (
        customer BIGINT NOT NULL,
        logdate DATE NOT NULL,
        sumbytes BIGINT NOT NULL,
        countrecords BIGINT NOT NULL,
        PRIMARY KEY (customer, logdate)
        )

fill it:

INSERT
INTO    log_customer_date
SELECT  customer, logdate, SUM(bytes) sumbytes, COUNT(*) countrecords
FROM    logdata
GROUP BY
        customer, logdate

and add further records in a trigger:

INSERT
INTO    log_customer_date
VALUES  (NEW.customer, NEW.logdate, NEW.bytes, 1)
ON DUPLICATE KEY
UPDATE  sumbytes = sumbytes + VALUES(sumbytes),
        countrecords = countrecords + VALUES(countrecords)

or in a script every once in a while:

INSERT
INTO    log_customer_date
SELECT  customer, logdate, SUM(bytes) sumbytes, COUNT(*) countrecords
FROM    logdata
WHERE   id > :lastid
GROUP BY
        customer, logdate
ON DUPLICATE KEY
UPDATE  sumbytes = sumbytes + VALUES(sumbytes),
        countrecords = countrecords + VALUES(countrecords)

, recording the highest id by the moment of the insert somewhere.