Mysql – Indexing is not working

indexindex-statisticsindex-tuningmysql-5

Following is the table and index information for which I have the queries at the end.

Create Table: CREATE TABLE `food` (
  `id` int(10) unsigned NOT NULL DEFAULT '0',
  `page_no` int(11) DEFAULT NULL,
  `nid` int(11) DEFAULT NULL,
  `name` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `source` varchar(4) CHARACTER SET utf8 NOT NULL,
  `brand` varchar(500) CHARACTER SET utf8 DEFAULT NULL,
  `cuisine` varchar(100) CHARACTER SET utf8 DEFAULT NULL,
  `subcuisine` varchar(100) CHARACTER SET utf8 DEFAULT NULL,
  `type` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `per_nr` float DEFAULT NULL,
  `category` varchar(100) CHARACTER SET utf8 DEFAULT NULL,
  `source_category` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `source_parent_category` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  KEY `id_idx` (`id`),
  KEY `page_no_idx` (`page_no`),
  KEY `source_idx` (`source`),
  KEY `nid_idx` (`nid`),
  KEY `per_nr_idx` (`per_nr`),
  KEY `per_nr_source_category_nid_idx` (`per_nr`,`source_category`,`nid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

EXPLAIN 1

mysql > EXPLAIN SELECT food.nid, name, brand, cuisine, source_category, subcuisine, course, per_nr  FROM food where category is not null and  per_nr is not null and nid is not null;

+----+-------------+-------+------+---------------------------------------------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys                                     | key  | key_len | ref  | rows   | Extra       |
+----+-------------+-------+------+---------------------------------------------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | food  | ALL  | nid_idx,per_nr_idx,per_nr_source_category_nid_idx | NULL | NULL    | NULL | 532752 | Using where |
+----+-------------+-------+------+---------------------------------------------------+------+---------+------+--------+-------------+

EXPLAIN 2

mysql > EXPLAIN SELECT name, nid, brand, cuisine, type, subcuisine, course, per_nr, source, source_category FROM food where per_nr>47 and source_category='Cereal Grains and Pasta' and food.nid!=205014 and food.nid is not null;

+----+-------------+-------+------+---------------------------------------------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys                                     | key  | key_len | ref  | rows   | Extra       |
+----+-------------+-------+------+---------------------------------------------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | food  | ALL  | nid_idx,per_nr_idx,per_nr_source_category_nid_idx | NULL | NULL    | NULL | 532752 | Using where |
+----+-------------+-------+------+---------------------------------------------------+------+---------+------+--------+-------------+
1 row in set (0.09 sec)

Queries:

  1. Explain 1: As we can see the query contains column name which are indexed as well, but still explain show that the mysql was not able to identify key to make it faster.
  2. Explain 2: I have created combined index per_nr_source_category_nid_idx in the same order in which the fields are available in where clause but still the mysql couldn't identify the index.

Although I have created indexes but still why system is not able to use it. This is my question in both the above queries.

Detailed Index details

mysql> SHOW INDEX FROM food;

+-------+------------+--------------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name                       | Seq_in_index | Column_name     | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+--------------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| food  |          1 | id_idx                         |            1 | id              | A         |      822286 |     NULL | NULL   |      | BTREE      |         |               |
| food  |          1 | page_no_idx                    |            1 | page_no         | A         |      822286 |     NULL | NULL   | YES  | BTREE      |         |               |
| food  |          1 | source_idx                     |            1 | source          | A         |          27 |     NULL | NULL   |      | BTREE      |         |               |
| food  |          1 | nid_idx                        |            1 | nid             | A         |      822286 |     NULL | NULL   | YES  | BTREE      |         |               |
| food  |          1 | per_nr_idx                     |            1 | per_nr          | A         |          27 |     NULL | NULL   | YES  | BTREE      |         |               |
| food  |          1 | per_nr_source_category_nid_idx |            1 | per_nr          | A         |          27 |     NULL | NULL   | YES  | BTREE      |         |               |
| food  |          1 | per_nr_source_category_nid_idx |            2 | source_category | A         |       12092 |     NULL | NULL   | YES  | BTREE      |         |               |
| food  |          1 | per_nr_source_category_nid_idx |            3 | nid             | A         |      822286 |     NULL | NULL   | YES  | BTREE      |         |               |
+-------+------------+--------------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
8 rows in set (4.35 sec)

Best Answer

The order you put the conditions in the WHERE clause doesn't matter. But whether the conditions are = or <> or IS NULL does matter in combination with the order of columns in composite indexes. To clear this out:

--------- condition ----------   ----- index can be used ----   -- index search --
column = @some_constant_value          yes                        index seek
column IS NULL                         yes                        index seek

column > @some_constant_value    yes but no further columns       index scan
column < @some_constant_value    yes but no further columns       index scan
column >= @some_constant_value   yes but no further columns       index scan
column <= @some_constant_value   yes but no further columns       index scan

column <> @some_constant_value   yes but no further columns       full index scan
column IS NOT NULL               yes but no further columns       full index scan
columnA = columnB                yes if both columns are in it    full index scan

So, for your first query which has a condition:

WHERE category IS NOT NULL 
  AND per_nr IS NOT NULL 
  AND nid IS NOT NULL 

whatever order you put the columns in the composite index, a full index scan will be needed. The optimizer, seeing that the selectivity of this operation will be very low (meaning that even after the index scan it will still have a very high percentage of the rows passing the condition and all those rows will have to be retrieved from the table, decides to do a full scan of the table. And that is probably the best way to execute the query.

For the second query which has a condition:

WHERE per_nr > 47 
  AND source_category = 'Cereal Grains and Pasta' 
  AND food.nid <> 205014 
  AND food.nid IS NOT NULL

we'll get rid of the redundant food.nid IS NOT NULL (if it is different than 205014, it cannot be NULL anyway) and we'll order the conditions according to the above chart:

WHERE source_category = 'Cereal Grains and Pasta' 
  AND per_nr > 47 
  AND nid <> 205014 

This gives us a very useful hint about how to create the composite index:

(source_category)               : good index, it will be used to efficiently (index seek)
                                         narrow the search to rows 
                                         with 'Cereal Grains and Pasta' only
(source_category, per_nr)       : even better, it will be used to efficiently narrow the
                                         search to rows with 'Cereal Grains and Pasta' 
                                         and per_nr > 47. Then these rows will be read 
                                         from the table and the rows that have 
                                         (nid = 205014 or nid IS NULL) will be excluded 
(source_category, per_nr) INCLUDE (nid)
                                : even more better than the above index,
                                         it will also exclude the rows that have 
                                         (nid = 205014 or nid IS NULL) using the index
   -- !! --   Unfortunately, MySQL has not (yet) implemented `INCLUDE` indexes, 
   -- !! --       so the only improvement can be
(source_category, per_nr, nid)  : slightly better than the 2nd index,
                                         it will also exclude the rows that have 
                                         (nid = 205014 or nid IS NULL) using the index

So, add the 4th (or the 2nd) index:

ALTER TABLE food
  ADD INDEX source_category__per_nr__nid__IX   -- choose a name for the index
    (source_category, per_nr, nid) ;

if you want the second query to be executed efficiently:

SELECT name, nid, brand, cuisine, type, subcuisine, 
       course, per_nr, source, source_category 
FROM food 
WHERE per_nr > 47 
  AND source_category = 'Cereal Grains and Pasta' 
  AND nid <> 205014 ;