Mysql – Improve performance of query with multiple group by column and count function in select

MySQLmysql-5.6optimizationperformancequery-performance

Brief Background of issue

We have a aggregate table to store rows and this table have currently 3 Million rows and we need to retrieve data from this table. Query for this data is taking too much time.

Data in this table increasing at 1 million rows/month
rate.

We are using MySQL with version 5.6

Table structure is as follows

CREATE TABLE `test` (
  `source_id` bigint(20) unsigned NOT NULL,
  `entity_id` varchar(40) NOT NULL,
  `type` int(11) NOT NULL,
  `channel_type` int(11) NOT NULL,
  `metrics_1` bigint(20) unsigned NOT NULL DEFAULT '0',
  `metrics_2` float unsigned DEFAULT NULL,
  `metrics_3` bigint(20) unsigned DEFAULT NULL,
  `posted_on` datetime NOT NULL COMMENT 'posted on',
  PRIMARY KEY (`source_id`,`posted_on`,`type`,`channel_type`,`entity_id`),
  KEY `test2` (`type`,`competitor_id`,`channel_type`,`posted_on`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50500 PARTITION BY RANGE  COLUMNS(posted_on)
(PARTITION q12010 VALUES LESS THAN ('2010-04-01') ENGINE = InnoDB,
 PARTITION q22010 VALUES LESS THAN ('2010-07-01') ENGINE = InnoDB,
 PARTITION q32010 VALUES LESS THAN ('2010-10-01') ENGINE = InnoDB,
 PARTITION q42010 VALUES LESS THAN ('2011-01-01') ENGINE = InnoDB,
 PARTITION q12011 VALUES LESS THAN ('2011-04-01') ENGINE = InnoDB,
 PARTITION q22011 VALUES LESS THAN ('2011-07-01') ENGINE = InnoDB,
 PARTITION q32011 VALUES LESS THAN ('2011-10-01') ENGINE = InnoDB,
 PARTITION q42011 VALUES LESS THAN ('2012-01-01') ENGINE = InnoDB,
 PARTITION q12012 VALUES LESS THAN ('2012-04-01') ENGINE = InnoDB,
 PARTITION q22012 VALUES LESS THAN ('2012-07-01') ENGINE = InnoDB,
 PARTITION q32012 VALUES LESS THAN ('2012-10-01') ENGINE = InnoDB,
 PARTITION q42012 VALUES LESS THAN ('2013-01-01') ENGINE = InnoDB,
 PARTITION q12013 VALUES LESS THAN ('2013-04-01') ENGINE = InnoDB,
 PARTITION q22013 VALUES LESS THAN ('2013-07-01') ENGINE = InnoDB,
 PARTITION q32013 VALUES LESS THAN ('2013-10-01') ENGINE = InnoDB,
 PARTITION q42013 VALUES LESS THAN ('2014-01-01') ENGINE = InnoDB,
 PARTITION q12014 VALUES LESS THAN ('2014-04-01') ENGINE = InnoDB,
 PARTITION q22014 VALUES LESS THAN ('2014-07-01') ENGINE = InnoDB,
 PARTITION q32014 VALUES LESS THAN ('2014-10-01') ENGINE = InnoDB,
 PARTITION q42014 VALUES LESS THAN ('2015-01-01') ENGINE = InnoDB,
 PARTITION q12015 VALUES LESS THAN ('2015-04-01') ENGINE = InnoDB,
 PARTITION q22015 VALUES LESS THAN ('2015-07-01') ENGINE = InnoDB,
 PARTITION q32015 VALUES LESS THAN ('2015-10-01') ENGINE = InnoDB,
 PARTITION q42015 VALUES LESS THAN ('2016-01-01') ENGINE = InnoDB,
 PARTITION q12016 VALUES LESS THAN ('2016-04-01') ENGINE = InnoDB,
 PARTITION q22016 VALUES LESS THAN (MAXVALUE) ENGINE = InnoDB) */

Primary Key for this table is

PRIMARY KEY (`source_id`,`posted_on`,`type`,`channel_type`,`entity_id`),

Query we are using to retrieve data is

SELECT 
    f.`source_id`, 
    f.`channel_type`, 
    COUNT(f.`entity_id`) AS act_count
FROM `test` AS f USE INDEX (test2)
WHERE 
    f.`source_id` IN (1, 2, 3)
    AND f.`type` = 1 
    AND f.`posted_on` BETWEEN '2014-11-06' AND '2016-02-06'
GROUP BY f.`source_type`, f.`channel_type`;

Index created for this query is

I read about indexes and created this index for the above query

KEY `test2` (`type`,`competitor_id`,`channel_type`,`posted_on`)

Output of Explain is as follows

id: 1
select_type: SELECT
table: f
type: index
possible keys: PRIMARY, test2
key: test2
key_len: 14
ref: NULL
rows: 2912247
Extra: Using where; Using index

Note

we have also create partitions on posted on date which is quarterly partition

PARTITION q22010 VALUES LESS THAN ('2010-07-01') ENGINE = InnoDB
PARTITION q32010 VALUES LESS THAN ('2010-10-01') ENGINE = InnoDB
.
.
.
PARTITION q12016 VALUES LESS THAN ('2016-04-01') ENGINE = InnoDB
PARTITION q22016 VALUES LESS THAN (MAXVALUE) ENGINE = InnoDB)

Issue is

This query execution time is 8.39 seconds even with index.

Also we need to apply this query two times with different date range. so it is almost taking 16 seconds to return data.

So Please anyone can help on this how can i get this query execution time down to 2 seconds or less.

Please note number of sources in IN query can be high (30 sources max)

Struggling with this issue from past two days.

Please Let me know if any extra information required.

Any help will be appreciated.

Best Answer

type int(11) NOT NULL COMMENT '1. chatter, 2. marketing', -- Change to TINYINT UNSIGNED. This will save 3 bytes per row. Ditto for several other fields. And for those fields that are also in indexes, especially the PK, that "3" gets multiplied.

bigint(20) unsigned -- If you are not expecting more than 4 billion, save 4 bytes each by switching to INT UNSIGNED.

Your MD5 does not need to be utf8; change the column to CHARACTER SET ascii.

"daily_timestamp = DATE(posted_on)" -- it is not proper to have redundant data like this. Is there any reason not to compute the DATE when you need that?

AND f.`daily_timestamp` BETWEEN '2014-11-06' AND '2016-02-06'

Works fine as

AND f.`posted_on` BETWEEN '2014-11-06' AND '2016-02-06'

except for the ending value, which was 'wrong' in the first place. (Did you really want all '2016-02-06' to be included?) I suggest

AND f.`posted_on` >= '2014-11-06'
AND f.`posted_on`  < '2014-11-06' + INTERVAL 15 MONTH

Is there a 1:1 mapping between this id and name?

`profile_id` varchar(40) NOT NULL COMMENT 'profile id',
`profile_name` varchar(500) DEFAULT NULL COMMENT 'Profile name',

If so, create another table and move profile_name out of here.

Why are your PARTITIONing? It slows down the query you are discussing. Will you be "purging" old data? (That would be a reason to keep it.)

I mentioned several cases of shrinking the footprint of the table; these will speed things up a little because of I/O.

But for the real big speedup, build a Summary table of the monthly COUNTs of whatever is interesting. Then query that table instead. Maintain that table incrementally. The new SELECT will probably run in less than 1 second. Mor on Summary tables.