Mysql – Benefits of BTREE in MySQL

btreeindexMySQL

What are the pros and cons of using a BTREE index in MySQL, regarding query speed, disk storage and memory usage?

Does BTREE provide easier iteration in increasing order ?
What kind of queries would benefit from a BTREE ?
Are there any disadvantages of using BTREE index ?
Does it increase space or indexing time?

Best Answer

Regardless of the Storage Engine (MyISAM or InnoDB), when it comes to BTREEs, you must make sure you understand the following characteristics:

Keys should be as small as possible
Random Keys for PRIMARY KEYs
- Insertions (Bulk or Programmatic) will perform root node and internal splitting periodically
- Introduces Overhead early in an index's life
- Breeds node fragmentation (especially for index pages)
- Causes Index Scans for Queries to be performed Randomly
Ordered Keys for PRIMARY KEYs
- Bulk Ordered Insertions delay root node and internal splitting
- Reloading data via mysqldump files and LOAD DATA INFILE commands promote the use of sorting mechanisms to address index initialization/reorganization (See my Oct 26, 2012 post: How badly does innodb fragment in the face of somewhat out-of-order insertions?)
- Programmatic Ordered Insertions promote root node and internal splitting of index pages in 45% of the cases
- Delays creation of Overhead
- Prevents node fragmentation
- Causes Index Scans for Queries to be performed Sequentially (less disk I/O)

When it comes to BTREEs in InnoDB, they tend to be more bloated than that of its counterparts MyISAM because of InnoDB's gen_clust_index, where row data live.

The PRIMARY KEY of an InnoDB table points right to its gen_clust_index. Secondary indexes always include a PRIMARY KEY entry. If you run a query that uses a Secondary Index and also has non-indexed columns in the WHERE clause, you could easily be doing two Index Lookups. With that in mind, you need to make sure all Secondary Indexes have all the needed columns for you queries' WHERE clauses (a.k.a. Covering Index).

Related Solutions

Indexing from start or when performance problem arises

Should I start indexing right from the start or when performance problem arises?

Indexing strategy tends to evolve as usage patterns emerge. That said, there are also strategies and design guidelines that can be applied up front.

Choose a good clustering key. You can usually determine the appropriate clustered index at design time, based on the expected pattern of inserts to a table. If a compelling case emerges for a change in the future, so be it.
Create your primary and other unique constraints. These will be enforced by unique indexes.
Create your foreign keys and associated non-clustered indexes. Foreign keys are your most frequently referenced join columns, so index them from the start.
Create indexes for any obviously highly selective queries. For query patterns you already know will be highly selective and likely to use lookups rather than scans.

Beyond the above, take a gradual and holistic approach to implementing new indexes. By holistic, I mean assess the potential benefit and impact to all queries and existing indexes when evaluating an addition.

A not uncommon problem in SQL Server circles is overindexing, as a result of guidance from the missing index DMVs and SSMS hints. Neither of these tools evaluate existing indexes and will merrily suggest you create a new 6 column index rather than add a single column to an existing 5 column index.

-- If you have this
CREATE NONCLUSTERED INDEX [IX_MyTable_MyIndex] ON [dbo].[MyTable] 
(
    [col1] ASC
    , [col2] ASC
    , [col3] ASC
    , [col4] ASC
    , [col5] ASC
)

-- But your query would benefit from the addition of a column
CREATE NONCLUSTERED INDEX [IX_MyTable_MyIndex] ON [dbo].[MyTable] 
(
    [col1] ASC
    , [col2] ASC
    , [col3] ASC
    , [col4] ASC
    , [col5] ASC
    , [col6] ASC
)

-- SSMS will suggest you create this instead
CREATE NONCLUSTERED INDEX [IX_MyTable_AnotherIndexWithTheSameColumnsAsTheExistingIndexPlusCol6] ON [dbo].[MyTable] 
(
    [col1] ASC
    , [col2] ASC
    , [col3] ASC
    , [col4] ASC
    , [col5] ASC
    , [col6] ASC
)

Kimberly Tripp has some excellent material on indexing strategy that while SQL focused is applicable to other platforms. For the SQL Server folk, there are some handy tools for identifying duplicates like the example above.

We can also create temporary index while executing a query. What are the pros and cons of such techniques?

This usually only applies for rarely run queries, typically ETL. You need to assess:

Does the time taken to create the index reduce the execution time of the query.
Does the maintenance overhead of leaving the index in place outweigh the time taken to create/drop when it's needed.

Mysql – How to store static list of hashes in MySQL effectively

Some options:

Use InnoDB. If not for these reasons, then for compression! With InnoDB you can:
```
ALTER TABLE my_table ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;
```
Replace 8 with 4 or 2 or 1 to (hopefully) get better compression. Since the table is static, I think this is a great solution for you (with strong compression, writes become slower, but you don't care about that).

Not only data is compressed - indexes also. I would suggest this is the easiest option you have.
Only index part of your column. You agree that the column is not UNIQUE but can be assumed to be. OK; is your index a UNIQUE index? If not, try to:
```
ALTER TABLE my_table ADD INDEX `md5_prefix_idx` (md5(8))
```
to only index first 8 bytes of the columns.
Try TokuDB, which is an alternative storage engine to InnoDB, and which has an amazing compression (I've seen data get 20 times smaller than InnoDB with TokuDB aggressive compression).

Best Answer

Related Solutions

Indexing from start or when performance problem arises

Mysql – How to store static list of hashes in MySQL effectively

Related Question