MySQL: How index speed up query execution

indexindex-tuningMySQLperformance

I have read multiple articles about MySQL indexes. Some of the important pages are below:
http://dev.mysql.com/doc/refman/5.6/en/mysql-indexes.html
https://stackoverflow.com/questions/3567981/how-do-mysql-indexes-work

Its correct that query using indexes executes much faster because there is no need to traverse every records. But as a novice RDBMS guy, all other remaining columns are dependent on indeed column. In a table with thousands of records, system need to traverse the index first (one by one, in any defined order I don't know).

To avoid a big description, I am limiting the scope of this question as

  • How MySQL traverse indexes?
  • How indexes arrangement (B-trees,
    hash,etc.) are determined?
  • What is the difference between traversing
    rows one by one and traversing index column value?

Best Answer

Here you have a video on the basic inner structure and inner workings of indexes. I recommend you to watch it all.

Basically, indexes are ordered structures on disk (although they can be cached, and they normally will for better performance) that will allow certain operations to be done faster. In particular, in MySQL, B-tree/B+tree (the most common ones) will allow you:

  • Perform filtering faster, with conditions like a=3, where a is an indexed column. This is possible because you can locate values that are 3 with less read operations
  • Obtain ordered results (ORDER by a): as values in an index are ordered, in certain cases, there will be no need to reorder them with a sorting algorithm, as we are already reading them in the right order
  • Reading values without accessing the whole row (covering index): SELECT a ... WHERE a = 3; as the index is smaller, it is less amount of bytes read, reduced operations and usually cached, improving read performance
  • Speeding up certain operations, like max(a) or min(a): the optimizer just needs to read the first or the last value on the index structure

In the case of B-trees, (look at this slide), if it is looking for the value 1, it only needs to read the root node. As 1 is before 3, it knows it is on the left node. In one step it discarded 50% of the reads. Imagine the advantage with millions of nodes, it is exponentially (or actually, logarithmically) better. If all rows have to be read, then reading may not be faster, in fact, it may be even slower. That is why when the selectivity is low MySQL refuses to use an index and performs a FULL TABLE SCAN. However, even in some cases, performing a full index scan may be an advantage (if less data has to be read overall), but it won't be as effective as if we perform a filtering with high selectivity.

In the case of hash indexes, the advantage comes from the use of a hash table. This can be even faster in some cases, but it cannot be used in all cases (for example, for > or < comparisons). The major engines on MySQL (InnoDB, MyISAM) do not support it, but they use internally in some structures hash maps.

MySQL will not select the index type automatically, it is selected on creation time (ADD INDEX my_index (col_a, col_b) USING BTREE), but defaults to BTREE.

MySQL and other DBRMSs support other type of indexes, too, like R-Trees, Fulltext, Spatial, etc.