Mysql – Analyzing a query plan for a query with TWO predicates, but index only has ONE

database-theoryexecution-planMySQLoptimization

Note: This is from an introductory database systems course at my university so I'm not entirely sure if it matters, but for what it's worth we are being taught MySQL.


One homework question I'm doing right now is to analyze some query plans for a query with two predicates, specifically:

SELECT *
FROM Student
WHERE wam > 75 AND faculty = 'Arts';

however the plans provided assume an index only on one of the predicates, such as an unclustered B+ tree index on (wam).

One example part of the question is as follows:

Compute the estimated cost of the best plan assuming that an
unclustered B+ tree index on (wam) is the only index available.
Suppose there are 60 index pages. Discuss and calculate alternative
plans

How would I go about calculating the cost of a plan like this? Would I have to calculate costs for unclustered/clustered B+ tree and hash indexes for the other predicate, faculty, as well? I am aware that one alternative plan is also simply a full table scan.

"Student" has 800 pages, 64000 tuples. "faculty" can take 10 unique values. "wam" can take 100 unique values.

Thank you, if I'm not being specific enough feel free to point out anything lacking in this post.

Best Answer

The optimal index for that query is INDEX(faculty, wam), in that order.

MySQL does not (in most situations) use two different indexes. However, it will use a 'composite' index to good use in some situations, such as this.

Without knowing how many rows have Arts and home many have > 75, you cannot accurately conclude which of INDEX(wam) or INDEX(faculty) would be better. Nor can the Optimizer, since it's statistics gathering is not very good. SHOW INDEXES and EXPLAIN SELECT provide numbers that come from the same statistics, so you won't necessarily come to the optimal conclusion.

A "Hash" index (which InnoDB does not have) is essentially no better than a BTree index. A Hash index is useless for a range (eg, > 75). (Hence, MySQL uses BTrees, and does not provide Hash indexes.)

Another point: If more than a small percentage of the table has wam > 75, INDEX(wam) will be shunned; scanning the entire table is likely to be faster than bouncing between the index's BTree and the data BTree. Again > 75 and 60 pages and 10 and 100 give no good clues of which path the Optimizer should take. The answer to the quoted question is "it depends" or "not enough info".

You could tackle the question by presenting two answers: One where very few rows have wam > 75, and one where most meet that predicate.

Has your instructor had any experience using databases, or only textbooks?

More discussion on devising optimal indexes: http://mysql.rjweb.org/doc.php/index_cookbook_mysql