Mysql – Multi-column and single-column index on the same column

indexMySQLoptimizationperformancequery-performance

I have two indexes on the same column:

  1. store_deleted_idx (store_id, deleted)
  2. store_idx (store_id)

DDL:

CREATE TABLE `stores_shoppers` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `store_id` int(11) NOT NULL,
  `store_shopper_id` varchar(128) NOT NULL,
  `country_code` varchar(30) DEFAULT NULL,
  `zip_code` varchar(50) DEFAULT NULL,
  `email` varchar(128) NOT NULL,
  `first_name` varchar(128) DEFAULT NULL,
  `last_name` varchar(128) DEFAULT NULL,
  `modified_on` datetime NOT NULL,
  `is_marketing_allowed` tinyint(1) DEFAULT NULL,
  `created_on` datetime DEFAULT NULL,
  `is_guest` tinyint(1) DEFAULT NULL,
  `phone` varchar(45) DEFAULT NULL,
  `total_spent` decimal(12,2) DEFAULT NULL,
  `orders_count` int(11) DEFAULT NULL,
  `group_id` int(11) DEFAULT NULL,
  `deleted` tinyint(1) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `store_idx` (`store_id`),
  KEY `store_deleted_idx` (`store_id`,`deleted`),
  CONSTRAINT `_stores_shoppers_ibfk_2` FOREIGN KEY (`store_id`) REFERENCES `store` (`id`),
  CONSTRAINT `_stores_shoppers_ibfk_4` FOREIGN KEY (`group_id`) REFERENCES `stores_shopper_group` (`id`) ON DELETE SET NULL ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=56909900 DEFAULT CHARSET=utf8;

My query:

SELECT s.*, t.*, g.*
FROM shoppers s
LEFT JOIN shopper_tag t ON s.id = t.shopper_id
LEFT JOIN shopper_group g ON s.group_id = g.id
WHERE s.store_id = '4494' AND (s.deleted = 0 OR s.deleted IS NULL)
ORDER BY s.created_on DESC
LIMIT 10 OFFSET 0;

If I run the explain, it shows the query is using only store_idx index, but if I remove store_idx index and run the explain, it shows store_deleted_idx is used.

Why isn't store_deleted_idx used in the first case?

I can't remove the single-column index.

Best Answer

Get rid of OR

(s.deleted = 0 OR s.deleted IS NULL)

Make up your mind. Either use 0 (and make it NOT NULL) or use NULL. It is probably better to have

TINYINT UNSIGNED NOT NULL DEFAULT '0'

Until you get rid of the OR, an index including deleted is not useful.

Composite index

After that, INDEX(deleted, store_id), in either order, becomes optimal.

As a rule, INDEX(a) is unnecessary if you also have INDEX(a,b).

Even better is

INDEX(deleted, store_id,   -- in either order
      created_at)          -- last

This will avoid the sorting that is currently necessary. (See EXPLAIN SELECT ...)

I can't remove the single-column index.

Do you mean that your fingers don't work? Your boss will slap your hand if you try? You don't have permissions? It comes back when you do remove it? The FOREIGN KEY gets in the way? What?

Cardinality

Fake news.

Within a composite index, cardinality is irrelevant.

When comparing two indexes, cardinality is important, but there are usually better ways to optimize, such as using a composite index. So, again, fake news.

Back to the question

Your experiments showed that either INDEX(store_id) or INDEX(store_id, deleted) would be used. Given a choice between the two of them, the Optimizer will pick the smaller one (on the presumption that the amount of effort is slightly less). Without a choice (ie, when you had deleted the smaller one), the Optimizer happily picked whichever one is left.

So, to avoid wasting disk space, DROP the shorter (single-column) INDEX. Meanwhile this query will continue to work efficiently.

"Why isn't store_deleted_idx used in the first case?" -- Because of the OR.

More on indexing

http://mysql.rjweb.org/doc.php/index_cookbook_mysql discusses how getting all the way through the WHERE (after getting rid of OR) lets the index get to the ORDER BY, and if you get through the ORDER BY (by tacking on created_by), you can get to LIMIT. Then it will read only 10 rows, not the whole table!