Another way to get the result is this. It finds first all groups that the teacher has surely taught (or is going to) by checking that she has started within the month and then in another subquery it finds - for every group - the last teacher that started at the first day of the month or earlier.
With the unique index you have on the table, the second subquery should be quite efficient. The first subquery would benefit from an index on (teacherid, startdate, groupid)
:
SELECT groupid
FROM pupilgroupteacher
WHERE teacherid = @teacher
AND startdate >= @month + INTERVAL 1 DAY
AND startdate < @month + INTERVAL 1 MONTH
UNION DISTINCT
SELECT gg.groupid
FROM
( SELECT DISTINCT groupid
FROM pupilgroupteacher
) AS gd
JOIN pupilgroupteacher AS gg
ON gg.groupid = gd.groupid
AND gg.startdate =
( SELECT MAX(gi.startdate)
FROM pupilgroupteacher AS gi
WHERE gi.groupid = gd.groupid
AND gi.startdate < @month + INTERVAL 1 DAY
)
WHERE gg.teacherid = @teacher ;
Your indexes are fine for the two types of queries you mentioned.
This query will be satisfied by traversing the clustered index on the primary key...
[...] WHERE participant_id = x AND question_id = y AND given_answer_id = z;
...and this one is satisfied by the index on 'question_id':
[...] WHERE question_id = x;
The output of EXPLAIN SELECT
is not telling you what you think it is telling you, because the value shown in rows
is an estimate of the number of rows the server will need to consider, not the actual rows it will examine. For InnoDB
these are based on index statistics.
rows
The rows column indicates the number of rows MySQL believes it must examine to execute the query.
For InnoDB tables, this number is an estimate, and may not always be exact.
— http://dev.mysql.com/doc/refman/5.5/en/explain-output.html#explain_rows
The optimizer gathers information about different possible query plans, and chooses the one with the lowest cost. The information shown in EXPLAIN
is the information the optimizer gathered about the plan it selected.
When type
is ref
and key
is not NULL
, this means that the name listed in the key
column is the name of the index that the optimizer has chosen to use to find the desired rows, so your query plan looks exactly as it should.
Note, sometimes you will see Using index
in the Extra
column and a lot of people assume that this means an index is being used, or that no index is being used when that doesn't appear, but that's not correct, either. Using index
describes a special case called a "covering index" -- it does not indicate whether an index is being used to locate the rows of interest.
It's possible that running ANALYZE [LOCAL] TABLE
would cause the numbers in rows
shown by EXPLAIN
to differ, but this is a simple query and selecting this index is an obvious choice for the optimizer to make, so ANALYZE TABLE
is unlikely to make any actual difference in performance.
It is possible, however, that your overall performance might see some marginal improvement with an occasional OPTIMIZE [LOCAL] TABLE
, because you are not inserting rows in primary key order (as would be the case with an auto_increment
primary key)... but on large tables this can be time-consuming because it rebuilds a new copy of the table... but, again, I wouldn't expect any significant change.
Best Answer
It's the transaction isolation that guarantees consistent results.
In your example, if you are concerned about possible changes from other threads between the first and the second query, you should enclose your queries in a single transaction: this is implicit if you have
autocommit
disabled, otherwise can be explicitly declared:But it must also be said that, in many real use cases (like a counter for visits on a web page), having a result which is consistent with the previous increment is not so important.
For a deep explanation about possible transaction isolation levels please refer to: http://dev.mysql.com/doc/refman/5.7/en/set-transaction.html