Mysql – Query with Indexes slower than without Indexes

indexinnodbMySQLmysql-5.6optimization

I'm using MySQL 5.6 and my storage engine is InnoDB.

I have a table with 1 million rows containing with the columns:

  • ID (primary key)
  • FirstName
  • LastName
  • foreign_key_id (foreign key, NOT NULL)
  • foreign_key_id2 (another foreign key, default NULL)

The rows are seperated under:

  • 25% with foreign_key_id value 1 and foreign_key_id2 NULL
  • 25% with foreign_key_id value 1 and foreign_key_id2 NOT NULL
  • 25% with foreign_key_id value 2 and foreign_key_id2 NULL
  • 25% with foreign_key_id value 2 and foreign_key_id2 NOT NULL

With the following indexes:

  • index foreign_key_idx on foreign_key_id
  • index foreign_key_2_idx on and foreign_key_id2
  • composite index foreign_key_comp_idx on (foreign_key_idx, foreign_key_2_idx)

I perform the following queries:

Query 1 – without indexes:

SELECT *
FROM table tbl
IGNORE INDEX(foreign_key_idx, foreign_key_2_idx, foreign_key_comp_idx)
WHERE tbl.foreign_key_id = 1 AND tbl.foreign_key_id2 IS NOT NULL

Query 2 – with indexes (no composite index):

SELECT *
FROM table tbl
IGNORE INDEX(foreign_key_comp_idx)
WHERE tbl.foreign_key_id = 1 AND tbl.foreign_key_id2 IS NOT NULL

Query 3 – with composite index (no other indexes):

SELECT *
FROM table tbl
IGNORE INDEX(foreign_key_idx, foreign_key_2_idx)
WHERE tbl.foreign_key_id = 1 AND tbl.foreign_key_id2 IS NOT NULL

The results:

Query 1 (no indexes) performs a full table scan and uses 1 million records with a
total duration of 0.37 seconds.

Query 2 (indexes, no composite index) performs a non-unique key lookup on foreign_key_idx index and
uses 500K records with a total duration of 0.6 seconds.

Query 3 (composite index only) performs an index range scan on composite index and uses 480K
records with a total duration of 0.13 seconds.

What I really don't understand is: why is query 2 (with indexes) always performing slower than query 1 (without indexes)? I'm really really stuck and need some help…

I've tested the queries above with different amount of rows, like 1k, 10k, 20k, 50k, 100k, 200k, 250k, 500k, 1M etc, always with the same ratio (25%), and the results where the same (query 2 always performing slow)

Thank you in advance, really appreciate any kind of input!

Edit (2 May 2016)

SHOW CREATE TABLE COMMAND:

CREATE TABLE `table` (
   `ID` int(11) NOT NULL AUTO_INCREMENT,
   `FirstName` varchar(255) NOT NULL,
   `LastName` varchar(255) NOT NULL,
   `foreign_key_id` int(11) NOT NULL,
   `foreign_key_id2` int(11) DEFAULT NULL,

   PRIMARY KEY (`ID`),
   KEY `foreign_key_idx` (`foreign_key_id`),
   KEY `foreign_key_2_idx` (`foreign_key_id2`),
   KEY `foreign_key_comp_idx ` (`foreign_key_id`,`foreign_key_id2`),

   CONSTRAINT `foreign_key_idx` FOREIGN KEY (`foreign_key_id`) REFERENCES `table2` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
   CONSTRAINT `foreign_key_2_idx` FOREIGN KEY (`foreign_key_id2`) REFERENCES `table3` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
 ) ENGINE=InnoDB AUTO_INCREMENT=1515998 DEFAULT CHARSET=latin1

EXPLAIN PLANS:
enter image description here

Not sure if important, but table2 has 20 records and table3 also 1 million.

Best Answer

My understanding: The table contains 1M rows of which 250k are returned by the query. There are 500k rows with foreign_key_id = 1 and 500k rows with af.foreign_key_id2 IS NOT NULL.

The query using full table scan (actually doing full index scan on the PRIMARY key in InnoDB) will read all 1M rows sequentially and check each of them for the conditions.

The query using foreign_key_idx (and it should be the same if it used 'foreign_key_2_idx') has to read 500k rows by "random" order (supposing the rows were inserted or assigned the ID randomly) and check the other condition on them. That means the query reads 500k rows by their primary key from the table - but that means it will probably access 100% of data pages of the table. So in the end the query reads half of the index and all of the table - more data to go through in total and the table is accessed by random seeks, not sequentialy. It should be no surprise that such query is slower than full table scan.

The query going by foreign_key_comp_idx will find the part of the index for foreign_key_id = 1 by ref access method and then fetch its part satisfying af.foreign_key_id2 IS NOT NULL by range access - finding 250k rows which it will then read from the table. Again there is a high probability that it will have to read more than 50% of the table pages but it may just be lucky with the data distribution that it is less than 100% and as a plus it does not have to check the conditions again as they are assured by the index.

What surprises me: usual understanding is that the optimizer will not even want to use an index if it will lead to fetching ~20% of the table rows, prefering full table scan instead as it is usually faster. Your IGNORE INDEX() hints should not change that, thats what FORCE INDEX() hint is for (telling optimizer that the table scan is extremely costly compared to the suggested index).

But maybe if the insert order was not totally random, the statistics and index dives show that (even when the expected rows numbers do not).