MySQL: How index speed up query execution

indexindex-tuningMySQLperformance

I have read multiple articles about MySQL indexes. Some of the important pages are below:
http://dev.mysql.com/doc/refman/5.6/en/mysql-indexes.html
https://stackoverflow.com/questions/3567981/how-do-mysql-indexes-work

Its correct that query using indexes executes much faster because there is no need to traverse every records. But as a novice RDBMS guy, all other remaining columns are dependent on indeed column. In a table with thousands of records, system need to traverse the index first (one by one, in any defined order I don't know).

To avoid a big description, I am limiting the scope of this question as

How MySQL traverse indexes?
How indexes arrangement (B-trees,
hash,etc.) are determined?
What is the difference between traversing
rows one by one and traversing index column value?

Best Answer

Here you have a video on the basic inner structure and inner workings of indexes. I recommend you to watch it all.

Basically, indexes are ordered structures on disk (although they can be cached, and they normally will for better performance) that will allow certain operations to be done faster. In particular, in MySQL, B-tree/B+tree (the most common ones) will allow you:

Perform filtering faster, with conditions like a=3, where a is an indexed column. This is possible because you can locate values that are 3 with less read operations
Obtain ordered results (ORDER by a): as values in an index are ordered, in certain cases, there will be no need to reorder them with a sorting algorithm, as we are already reading them in the right order
Reading values without accessing the whole row (covering index): SELECT a ... WHERE a = 3; as the index is smaller, it is less amount of bytes read, reduced operations and usually cached, improving read performance
Speeding up certain operations, like max(a) or min(a): the optimizer just needs to read the first or the last value on the index structure

In the case of B-trees, (look at this slide), if it is looking for the value 1, it only needs to read the root node. As 1 is before 3, it knows it is on the left node. In one step it discarded 50% of the reads. Imagine the advantage with millions of nodes, it is exponentially (or actually, logarithmically) better. If all rows have to be read, then reading may not be faster, in fact, it may be even slower. That is why when the selectivity is low MySQL refuses to use an index and performs a FULL TABLE SCAN. However, even in some cases, performing a full index scan may be an advantage (if less data has to be read overall), but it won't be as effective as if we perform a filtering with high selectivity.

In the case of hash indexes, the advantage comes from the use of a hash table. This can be even faster in some cases, but it cannot be used in all cases (for example, for > or < comparisons). The major engines on MySQL (InnoDB, MyISAM) do not support it, but they use internally in some structures hash maps.

MySQL will not select the index type automatically, it is selected on creation time (ADD INDEX my_index (col_a, col_b) USING BTREE), but defaults to BTREE.

MySQL and other DBRMSs support other type of indexes, too, like R-Trees, Fulltext, Spatial, etc.

Related Solutions

MySQL, “columns partitioning” when multiple columns are individiually important

You are right that mysql will check only one partition for a specific sender_id, but checks all the partitions for a specific receiver_id, as shown here:

mysql> explain partitions select * from messages where sender_id =5;
+----+-------------+----------+------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table    | partitions | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | messages | p0         | ALL  | NULL          | NULL | NULL    | NULL |    2 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

mysql> explain partitions select * from messages where receiver_id =5;
+----+-------------+----------+------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table    | partitions | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | messages | p0,p1,p3   | ALL  | NULL          | NULL | NULL    | NULL |    3 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

However, there are still benefits to that partitioning, depending on your hardware. When looking in all partitions for a receiver_id, mysql is really performing 3 select statements, one for each partition. It may be able to parallelize these select statements. Additionally, if you index receiver_id, it will be accessing 3, smaller, indicies.

In the end, you just have to do performance testing and see if it is paying off for your use case. Seeing as 100MB fits in RAM pretty easily these days, I wouldn't consider partitioning such a small table unless you have specific reasons to do so.

Mysql – Can spatial index be a part of a multiple columns index

No, you can't.

An index, in MySQL, will either be

a BTREE single or multi-column index.
a HASH index (available only for MEMORY tables), that can span several columns.
a SPATIAL index (available only for MyISAM tables).
a FULLTEXT index (available for MyISAM and 5.6.4+ InnoDB tables), that can span several columns of CHAR, VARCHAR or TEXT type.

You can't combine these types.

Best Answer

Related Solutions

MySQL, “columns partitioning” when multiple columns are individiually important

Mysql – Can spatial index be a part of a multiple columns index

Related Question