Mysql – Index optimization and cardinality

index-tuningMySQLmysql-5.5performancequery-performance

I'm by no means a DBA expert. Rather, a Magento developer having to troubleshoot some slow queries. I'll jump right in — here's my attempt to EXPLAIN the troublesome query:

  EXPLAIN
  SELECT `s`.`query_id`,
         `s`.`product_id`,
         SUM(weight) AS `weight`
  FROM
    (SELECT `w1`.`query_id`,
            `w1`.`product_id`,
            SUM(weight) AS `weight`,
            GROUP_CONCAT(data_index) AS `data_index`
     FROM
       (SELECT 1189 AS `query_id`,
               `i`.`entity_id` AS `product_id`,
               `i`.`weight`,
               `i`.`data_index`
        FROM `activo_advancedsearch_weighted_search` AS `i`
        WHERE (i.store_id = 1)
          AND (`i`.`data_index` LIKE '%southern%')
        UNION ALL SELECT 1189 AS `query_id`,
                         `i`.`entity_id` AS `product_id`,
                         `i`.`weight`,
                         `i`.`data_index`
        FROM `activo_advancedsearch_weighted_search` AS `i`
        WHERE (i.store_id = 1)
          AND (`i`.`data_index` LIKE '%proper%')) AS `w1`
     GROUP BY `product_id`) AS `s`
  WHERE (((`s`.`data_index` LIKE '%southern%'
           OR `s`.`data_index` LIKE '%Southern%')
          AND (`s`.`data_index` LIKE '%proper%')))
  GROUP BY `product_id`
  ORDER BY `weight` DESC;

This table has anywhere between 4 and 5 million rows. The server is a dedicated DB server running on MySQL 5.5.42-37.1 and all the tables here will be running under InnoDB storage.

And the explained result:

+----+--------------+------------+------+----------------------------------------------------+----------------------------------------------------+---------+------+---------+----------------------------------------------+
| id | select_type  | table      | type | possible_keys                                      | key                                                | key_len | ref  | rows    | Extra                                        |
+----+--------------+------------+------+----------------------------------------------------+----------------------------------------------------+---------+------+---------+----------------------------------------------+
|  1 | PRIMARY      | <derived2> | ALL  | NULL                                               | NULL                                               | NULL    | NULL |      46 | Using where; Using temporary; Using filesort |
|  2 | DERIVED      | <derived3> | ALL  | NULL                                               | NULL                                               | NULL    | NULL |  428075 | Using filesort                               |
|  3 | DERIVED      | i          | ref  | IDX_ACTIVO_ADVANCEDSEARCH_WEIGHTED_SEARCH_STORE_ID | IDX_ACTIVO_ADVANCEDSEARCH_WEIGHTED_SEARCH_STORE_ID | 2       |      | 2663774 | Using where                                  |
|  4 | UNION        | i          | ref  | IDX_ACTIVO_ADVANCEDSEARCH_WEIGHTED_SEARCH_STORE_ID | IDX_ACTIVO_ADVANCEDSEARCH_WEIGHTED_SEARCH_STORE_ID | 2       |      | 2663774 | Using where                                  |
| NULL | UNION RESULT | <union3,4> | ALL  | NULL                                               | NULL                                               | NULL    | NULL |    NULL |                                              |
+----+--------------+------------+------+----------------------------------------------------+----------------------------------------------------+---------+------+---------+----------------------------------------------+

Problems I can identify:

  • Using a temporary table
  • Using filesort
  • The number of rows being read

Update: Also, if it helps, here's the table schema.

CREATE TABLE `activo_advancedsearch_weighted_search` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'ID',
  `entity_id` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Entity ID',
  `store_id` smallint(5) unsigned NOT NULL COMMENT 'Store Id',
  `weight` int(10) unsigned NOT NULL DEFAULT '1' COMMENT 'Weight',
  `data_index` text COMMENT 'Data',
  PRIMARY KEY (`id`),
  KEY `IDX_ACTIVO_ADVANCEDSEARCH_WEIGHTED_SEARCH_ENTITY_ID` (`entity_id`),
  KEY `IDX_ACTIVO_ADVANCEDSEARCH_WEIGHTED_SEARCH_STORE_ID` (`store_id`)
) ENGINE=InnoDB AUTO_INCREMENT=17872647 DEFAULT CHARSET=utf8 COMMENT='Advanced Search Weighted Attribute Index Table'

I've read articles about index optimizations, but to be honest the textbook scenarios did little to help me in the view of a real world example like the above.

I'm not asking you to tell me what indexes I should use (though I won't stop you!). But at least, please take this example to teach me a little bit about how the optimization process would play out in this case.

Best Answer

The query has a lot of bloat:

  • Two group by, the second is completely unnecessary.
  • Three levels of nesting, the last is also completely unnecessary.
  • It uses GROUP_CONCAT() to find all the product_id that have rows with 'southern' and rows with 'proper'. Not the best way in my opinion.
  • The 1189 AS query_id seems to return redundant information (the same value in all the rows of the result.)
  • Using UNION ALL instead of OR may be ok for efficiency but the group by in the next step is probably the performance killer (grouping 400K rows without the use of any index).

Things that are not a problem:

  • Using where; Using temporary; Using filesort This looks bad but sorting 46 rows is not going to be an issue. using filesort in MySQL Explain does not mean that a file is used (not a very good naming choice indeed).

One thing you can do without changing the query:

  • add an index on (store_id, entity_id, data_index, weight).
    This will basically create a copy of the entire table (but a bit narrower as the primary index in InnoDB tables uses some extra hidden space for internal purposes). The query will be able to use the index, doing a partial index scan (only the rows with store_id = 1) and then get all the info needed from the index (no table scan at all). Unfortunately, as the the query is written now, it will still need 2 partial index scans (and then all the unnecessary bloat that follows).

If you can change the query though, there are various options to try:

  • First, we could replace UNION ALL with OR so we only do one table scan (or one index scan if we have the index suggested above) and not two. Also remove the the levels of nesting and only have one, and only one GROUP BY by altering the query logic (removing the GROUP_CONCAT() and using HAVING with counts and sums):

    SELECT 1189 AS query_id,
           i.entiity_id AS product_id,
           SUM(CASE WHEN i.data_index LIKE '%southern%' THEN i.weight ELSE 0 END) 
         + SUM(CASE WHEN i.data_index LIKE '%proper%' THEN i.weight ELSE 0 END)
             AS weight
    FROM activo_advancedsearch_weighted_search AS i
    WHERE i.store_id = 1
      AND ( i.data_index LIKE '%southern%'
         OR i.data_index LIKE '%proper%'
          )
    GROUP BY i.entity_id
    HAVING SUM(i.data_index LIKE '%southern%') > 0
       AND SUM(i.data_index LIKE '%proper%') > 0
    ORDER BY weight DESC;