MySQL – Optimize WHERE IN + ORDER BY and Avoid Filesort

indexMySQLoptimizationquery-performance

This is my table definition, without any special index for now:

CREATE TABLE `filter` (
  `field_name` varchar(32) COLLATE utf8_unicode_ci NOT NULL,
  `category_id` varchar(32) COLLATE utf8_unicode_ci NOT NULL,
  `position` smallint(6) NOT NULL,
  `options` longtext COLLATE utf8_unicode_ci COMMENT '(DC2Type:json_array)',
  PRIMARY KEY (`field_name`,`category_id`),
  KEY `IDX_702C956612469DE2` (`category_id`),
  CONSTRAINT `FK_702C956612469DE2` 
      FOREIGN KEY (`category_id`) REFERENCES `category` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Sample data:

+------------+-----------------------+-------------+----------+
| field_name |        options        | category_id | position |
+------------+-----------------------+-------------+----------+
| color      | {"label": "Color"}    |        1895 |        1 |
| material   | {"label": "Material"} |        1895 |        2 |
| color      | {"label": "Color"}    |        1896 |        1 |
| color      | {"label": "Color"}    |        1897 |        1 |
+------------+-----------------------+-------------+----------+

Most of the time I do a query with WHERE IN and ORDER BY position and selecting only field_name and options:

EXPLAIN SELECT 
  field_name
  options
FROM 
  filter
WHERE 
  category_id IN ('1895', '1896', '1897') 
ORDER BY 
  position ASC

This gives me Using index condition; Using filesort, quite bad uh?

{
   "query_block":{
      "select_id":1,
      "ordering_operation":{
         "using_filesort":true,
         "table":{
            "table_name":"filter",
            "access_type":"range",
            "possible_keys":[
               "IDX_702C956612469DE2"
            ],
            "key":"IDX_702C956612469DE2",
            "used_key_parts":[
               "category_id"
            ],
            "key_length":"98",
            "rows":5,
            "filtered":100,
            "index_condition":"(`filter`.`category_id` in ('1895','1896','1897'))"
         }
      }
   }
}

With an index on category_id, position Using where; Using index; Using filesort:

{
   "query_block":{
      "select_id":1,
      "ordering_operation":{
         "using_filesort":true,
         "table":{
            "table_name":"filter",
            "access_type":"range",
            "possible_keys":[
               "IDX_702C956612469DE2",
               "IDX_CATEGORY_POSITION"
            ],
            "key":"IDX_CATEGORY_POSITION",
            "used_key_parts":[
               "category_id"
            ],
            "key_length":"98",
            "rows":5,
            "filtered":100,
            "using_index":true,
            "attached_condition":"(`filter`.`category_id` in ('1895','1896','1897'))"
         }
      }
   }
}

So which explain is better? I would say the first because of using index condition (there is no "where" part). Is there any way to avoid the filesort?

Best Answer

Given the structure of multicolumn B-tree index it is not viable to do a sort on position when using IN on category_id. But as the test data suggest the position is not "global" but seems to have a meaning only for given category_id. So as I suggested in comments, it is instead possible to ORDER BY (category_id, position) - that can use two-column index on those columns to get it without filesort. http://sqlfiddle.com/#!9/aabaa/7

Because of the longtext column it is not possible to turn this into an index-only scan (text columns can be only indexed on prefix, not on entire value, as index key length is limited).

Using varchar fields as primary key has some drawbacks, but it is not "wrong" by itself so just a little suggestion - would it make sense in your desing to use some integer ids instead?