MySQL Index Tuning – How to Remove Multiple Redundant Indexes

indexindex-tuningMySQLschema

I have a legacy app.
I wanted to find out which tables are the largest ones using script from Percona blog. It turned out some of my tables have indices that are larger than actual data by factor of 3 and more.
I've dumped schema off the MySql server and found that some tables have lots of what seem to be redundant indices.

Here's how one of the worst tables is created:

CREATE TABLE `pictures_relations` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `pic_id` int(10) DEFAULT NULL,
  `url_id` int(10) DEFAULT NULL COMMENT 'id урла, с которого файл был/будет скачан',
  `is_stub` tinyint(1) NOT NULL DEFAULT '0',
  `item_id` int(11) NOT NULL,
  `module_id` int(11) NOT NULL,
  `item_type` varchar(8) NOT NULL,
  `is_file` tinyint(1) NOT NULL,
  `is_hide` tinyint(1) NOT NULL,
  `is_cover` tinyint(1) NOT NULL,
  `image_only` tinyint(1) DEFAULT NULL,
  `is_custom` tinyint(1) NOT NULL DEFAULT '0',
  `skip` tinyint(1) NOT NULL DEFAULT '0',
  `order` int(11) NOT NULL,
  `author` int(11) NOT NULL,
  `add_date` datetime NOT NULL,
  PRIMARY KEY (`id`),
  KEY `is_hide` (`is_hide`),
  KEY `is_cover` (`is_cover`),
  KEY `add_date` (`add_date`),
  KEY `item_id_2` (`item_id`,`module_id`,`item_type`,`is_file`,`is_cover`),
  KEY `order` (`item_id`,`module_id`,`item_type`,`is_file`,`order`),
  KEY `item_id` (`item_id`,`module_id`,`item_type`,`is_file`,`is_custom`),
  KEY `pic_id` (`pic_id`),
  KEY `item_id_3` (`item_id`,`module_id`,`item_type`,`is_file`,`skip`),
  KEY `item_id_4` (`item_id`,`module_id`,`item_type`,`is_file`),
  KEY `pic_id_2` (`pic_id`,`skip`),
  KEY `skip` (`skip`,`pic_id`),
  KEY `stub_url` (`is_stub`,`url_id`),
  KEY `stub_skip_url` (`is_stub`,`skip`,`url_id`)
) ENGINE=InnoDB AUTO_INCREMENT=6794944 DEFAULT CHARSET=utf8;

I can clearly see that indices order, item_id, item_id_2, item_id_3 and item_id_4 are mostly similar, but they differ a bit. Using convenience views for easily detecting redundant indexes from this DBA.StackExchange answer helps, but not much — it does show that item_id_4 is redundant to all others, but doesn't help with splitting/merging them into more sane indexes.

The question is:
Is there a Rule of Thumb to refactor these indexes into something sane, with less duplication?

Like, if I have idx1('a', 'b', 'c'), idx2('a', 'b', 'd'), idx3('a', 'b', 'e'), then I have to make it into idx1('a', 'b'), idx2('c'), idx3('d') and idx4('e')?

Best Answer

The question cannot be answered without seeing the SELECTs.

Here are some partial answers...

  • The only obviously redundant queries are pick_id and item_id_4. The rule is simple: If all the column(s) of an index are exactly the first column (s) of some other index, then toss it. (Note: This applies only to non-unique indexes.)

  • The next set of apparent duds are (is_*). These all seem to be true/false "flags". The Optimizer won't use such (with rare exceptions). It will, on the other hand, consider INDEX(is_foo, ...). That is a composite index starting with a flag. (And, no, the low cardinality of the first column is not bad.)

  • 5 columns in an index is pushing the max that it "should" have. I would be tempted to keep one of the 5-column indexes starting with (item_id,module_id,item_type,is_file). The ones I tossed would still be benefit, at some level, by the one I keep.

  • Reordering the columns depends on what the SELECTs look like. If the user can request "any" combination of columns to filter, then punt. Well, not totally punt... Make a few combinations of 2 or 3 columns.

  • INDEX(a,b,c) is not the same as INDEX(a,b), INDEX(c). Some queries benefit from one; some from the other.

See also: Cookbook on generating indexes