In our MySQL 5.5 database, we have following InnoDB table with 30M+ rows:
+----------------+-------------+-----------------+--------+
| participant_id | question_id | given_answer_id | status |
+----------------+-------------+-----------------+--------+
| 500 | 12 | 25 | 0 |
+----------------+-------------+-----------------+--------+
The combination participant_id + question_id + given_answer_id is unique.
At the moment we have following keys:
PRIMARY KEY (in this order)
- participant_id
- question_id
- given_answer_id
INDEX KEY
- question_id
For this table, we have two kinds of select queries in our application:
[...] WHERE participant_id = x AND question_id = y AND given_answer_id = z;
and
[...] WHERE question_id = x;
Normally, every participant_id has between 0 and <100 rows with different question_ids. The other way around, every question_id can have unlimited (usually not more than 100 000) rows with different participant_ids. The first query is executed way more often than the second one.
When we execute following query, it shows us that 32096 rows where looked up:
EXPLAIN SELECT * FROM example WHERE question_id = 500;
+----+-------------+-----------+------+---------------+-------------+---------+-------+-------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+-------------+---------+-------+-------+-------+
| 1 | SIMPLE | example | ref | question_id | question_id | 8 | const | 32096 | |
+----+-------------+-----------+------+---------------+-------------+---------+-------+-------+-------+
Whereas, when we execute the same query without EXPLAIN only 18732 rows are returned.
What indexes do we need on this table in order to prevent this overhead but still performing for both kinds of queries?
Here's is the code to create this table:
CREATE TABLE `example` ( `participant_id` BIGINT(20) UNSIGNED NOT NULL, `question_id` BIGINT(20) UNSIGNED NOT NULL, `given_answer_id` BIGINT(20) UNSIGNED NOT NULL, `status` TINYINT(3) UNSIGNED NOT NULL DEFAULT '0', PRIMARY KEY (`participant_id`, `question_id`, `given_answer_id`), INDEX `question_id` (`question_id`) ) ENGINE=InnoDB;
Best Answer
Your indexes are fine for the two types of queries you mentioned.
This query will be satisfied by traversing the clustered index on the primary key...
...and this one is satisfied by the index on 'question_id':
The output of
EXPLAIN SELECT
is not telling you what you think it is telling you, because the value shown inrows
is an estimate of the number of rows the server will need to consider, not the actual rows it will examine. ForInnoDB
these are based on index statistics.The optimizer gathers information about different possible query plans, and chooses the one with the lowest cost. The information shown in
EXPLAIN
is the information the optimizer gathered about the plan it selected.When
type
isref
andkey
is notNULL
, this means that the name listed in thekey
column is the name of the index that the optimizer has chosen to use to find the desired rows, so your query plan looks exactly as it should.Note, sometimes you will see
Using index
in theExtra
column and a lot of people assume that this means an index is being used, or that no index is being used when that doesn't appear, but that's not correct, either.Using index
describes a special case called a "covering index" -- it does not indicate whether an index is being used to locate the rows of interest.It's possible that running
ANALYZE [LOCAL] TABLE
would cause the numbers inrows
shown byEXPLAIN
to differ, but this is a simple query and selecting this index is an obvious choice for the optimizer to make, soANALYZE TABLE
is unlikely to make any actual difference in performance.It is possible, however, that your overall performance might see some marginal improvement with an occasional
OPTIMIZE [LOCAL] TABLE
, because you are not inserting rows in primary key order (as would be the case with anauto_increment
primary key)... but on large tables this can be time-consuming because it rebuilds a new copy of the table... but, again, I wouldn't expect any significant change.