I'm by no means a DBA expert. Rather, a Magento developer having to troubleshoot some slow queries. I'll jump right in — here's my attempt to EXPLAIN
the troublesome query:
EXPLAIN
SELECT `s`.`query_id`,
`s`.`product_id`,
SUM(weight) AS `weight`
FROM
(SELECT `w1`.`query_id`,
`w1`.`product_id`,
SUM(weight) AS `weight`,
GROUP_CONCAT(data_index) AS `data_index`
FROM
(SELECT 1189 AS `query_id`,
`i`.`entity_id` AS `product_id`,
`i`.`weight`,
`i`.`data_index`
FROM `activo_advancedsearch_weighted_search` AS `i`
WHERE (i.store_id = 1)
AND (`i`.`data_index` LIKE '%southern%')
UNION ALL SELECT 1189 AS `query_id`,
`i`.`entity_id` AS `product_id`,
`i`.`weight`,
`i`.`data_index`
FROM `activo_advancedsearch_weighted_search` AS `i`
WHERE (i.store_id = 1)
AND (`i`.`data_index` LIKE '%proper%')) AS `w1`
GROUP BY `product_id`) AS `s`
WHERE (((`s`.`data_index` LIKE '%southern%'
OR `s`.`data_index` LIKE '%Southern%')
AND (`s`.`data_index` LIKE '%proper%')))
GROUP BY `product_id`
ORDER BY `weight` DESC;
This table has anywhere between 4 and 5 million rows. The server is a dedicated DB server running on MySQL 5.5.42-37.1 and all the tables here will be running under InnoDB storage.
And the explained result:
+----+--------------+------------+------+----------------------------------------------------+----------------------------------------------------+---------+------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+------------+------+----------------------------------------------------+----------------------------------------------------+---------+------+---------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 46 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | <derived3> | ALL | NULL | NULL | NULL | NULL | 428075 | Using filesort |
| 3 | DERIVED | i | ref | IDX_ACTIVO_ADVANCEDSEARCH_WEIGHTED_SEARCH_STORE_ID | IDX_ACTIVO_ADVANCEDSEARCH_WEIGHTED_SEARCH_STORE_ID | 2 | | 2663774 | Using where |
| 4 | UNION | i | ref | IDX_ACTIVO_ADVANCEDSEARCH_WEIGHTED_SEARCH_STORE_ID | IDX_ACTIVO_ADVANCEDSEARCH_WEIGHTED_SEARCH_STORE_ID | 2 | | 2663774 | Using where |
| NULL | UNION RESULT | <union3,4> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+------------+------+----------------------------------------------------+----------------------------------------------------+---------+------+---------+----------------------------------------------+
Problems I can identify:
- Using a temporary table
- Using filesort
- The number of rows being read
Update: Also, if it helps, here's the table schema.
CREATE TABLE `activo_advancedsearch_weighted_search` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'ID',
`entity_id` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Entity ID',
`store_id` smallint(5) unsigned NOT NULL COMMENT 'Store Id',
`weight` int(10) unsigned NOT NULL DEFAULT '1' COMMENT 'Weight',
`data_index` text COMMENT 'Data',
PRIMARY KEY (`id`),
KEY `IDX_ACTIVO_ADVANCEDSEARCH_WEIGHTED_SEARCH_ENTITY_ID` (`entity_id`),
KEY `IDX_ACTIVO_ADVANCEDSEARCH_WEIGHTED_SEARCH_STORE_ID` (`store_id`)
) ENGINE=InnoDB AUTO_INCREMENT=17872647 DEFAULT CHARSET=utf8 COMMENT='Advanced Search Weighted Attribute Index Table'
I've read articles about index optimizations, but to be honest the textbook scenarios did little to help me in the view of a real world example like the above.
I'm not asking you to tell me what indexes I should use (though I won't stop you!). But at least, please take this example to teach me a little bit about how the optimization process would play out in this case.
Best Answer
The query has a lot of bloat:
group by
, the second is completely unnecessary.GROUP_CONCAT()
to find all theproduct_id
that have rows with'southern'
and rows with'proper'
. Not the best way in my opinion.1189 AS query_id
seems to return redundant information (the same value in all the rows of the result.)UNION ALL
instead ofOR
may be ok for efficiency but thegroup by
in the next step is probably the performance killer (grouping 400K rows without the use of any index).Things that are not a problem:
Using where; Using temporary; Using filesort
This looks bad but sorting 46 rows is not going to be an issue.using filesort
in MySQL Explain does not mean that a file is used (not a very good naming choice indeed).One thing you can do without changing the query:
(store_id, entity_id, data_index, weight)
.This will basically create a copy of the entire table (but a bit narrower as the primary index in InnoDB tables uses some extra hidden space for internal purposes). The query will be able to use the index, doing a partial index scan (only the rows with
store_id = 1
) and then get all the info needed from the index (no table scan at all). Unfortunately, as the the query is written now, it will still need 2 partial index scans (and then all the unnecessary bloat that follows).If you can change the query though, there are various options to try:
First, we could replace
UNION ALL
withOR
so we only do one table scan (or one index scan if we have the index suggested above) and not two. Also remove the the levels of nesting and only have one, and only oneGROUP BY
by altering the query logic (removing theGROUP_CONCAT()
and usingHAVING
with counts and sums):