MySQL vs MariaDB – Do They Have SQL Server’s ‘Included Columns’ for Indexes?

indexmariadbMySQL

In SQL Server you can create an index and have additional columns included on that index that can help performance in certain circumstances. Is that same ability available for mysql or mariadb under a different name, perhaps? I was unable to find anything using the term "included columns".

Best Answer

Looking at the MySQL Documentation, the glossary indicates this about Covering Indexes:

An index that includes all the columns retrieved by a query. Instead of using the index values as pointers to find the full table rows, the query returns values from the index structure, saving disk I/O. InnoDB can apply this optimization technique to more indexes than MyISAM can, because InnoDB secondary indexes also include the primary key columns. InnoDB cannot apply this technique for queries against tables modified by a transaction, until that transaction ends.

Any column index or composite index could act as a covering index, given the right query. Design your indexes and queries to take advantage of this optimization technique wherever possible.

The implication here is that there is no direct equivalent of an INCLUDE statement in MySQL, however if the index provides coverage of the columns involved in an index, it can, rather obviously, still be considered to be covering, much like an index with included columns in SQL Server would be.

MariaDB has the following in their documentation for covering indexes:

A "Covering" index is an index that contains all the columns in the SELECT. It is special in that the SELECT can be completed by looking only at the INDEX BTree. (Since InnoDB's PRIMARY KEY is clustered with the data, "covering" is of no benefit when considering at the PRIMARY KEY.)

Mini-cookbook:

Gather the list of column(s) according to the "Algorithm", above.

Add to the end of the list the rest of the columns seen in the SELECT, in any order.

Examples:

SELECT x FROM t WHERE y = 5; ⇒ INDEX(y,x) -- The algorithm said just INDEX(y)  
SELECT x,z FROM t WHERE y = 5 AND q = 7; ⇒ INDEX(y,q,x,z) -- y and q in either order (Algorithm), then x and z in either order (covering).  
SELECT x FROM t WHERE y > 5 AND q > 7; ⇒ INDEX(y,q,x) -- y or q first (that's as far as the Algorithm goes), then the other two fields afterwards.   
The speedup you get might be minor, or it might be spectacular; it is hard to predict.

But...

It is not wise to build an index with lots of columns. Let's cut it off at 5 (Rule of Thumb).

Prefix indexes cannot 'cover', so don't use them anywhere in a 'covering' index.

There are limits (3KB?) on how 'wide' an index can be, so "covering" may not be possible.

Related Solutions

How does Oracle handle composite index lookups

Loosely speaking, the CBO may choose to:

build up a list of all possible values for the 'missing' leading columns (this can be done fairly efficiently from the index structure itself)
iteratively perform range scans for each combination of missing columns and the column provided
union the whole lot together in one result set

This is what is called a 'skip scan' in Oracle terminology. Skip scans work best when the number of possible values in step (1) is relatively small (that is small compared to the size of the index)

Under what circumstances can Oracle (at least in 11g) do a lookup without the left-most prefix columns existing in the query?

Oracle will use statistics to get an estimate of the cardinality of step (1) before weighing up if performing that many range scans will cost more than just scanning the whole index sequentially

Mysql – Combining columns in index

What has to be adhered to is the query you know you are going to make.

Let go back to the original question's query

SELECT /* [things omitted] */ articles.blogpost_id, articles.id AS articleid
FROM blogposts
JOIN articles ON articles.blogpost_id = blogposts.id
WHERE blogposts.deleted = 0
AND blogposts.title LIKE '%{de}%'
AND blogposts.visible = 1
AND blogposts.date_published <= NOW()
ORDER BY blogposts.date_created DESC
LIMIT 0 , 50

Look carefully at the WHERE clause. I see two static values

blogposts.deleted = 0
blogposts.visible = 1

These cannot be indexes by themselves because of their implied cardinality. Think about it:

deleted is either 0 or 1 (That's a Cardinality of 2)
visible is either 0 or 1 (That's a Cardinality of 2)

All Query Optimizers (MySQL, Oracle, PostgreSQL, MSSQL, etc) would take one look at those indexes and decide not to use the index.

Look at the ORDER BY clause : ORDER BY blogposts.date_created DESC. This shows a predictable ordering schema. This may help bypass the need for sorting.

Combining these three columns into a single index gives the Query Optimizer some relief by gathering data with delete=0,visible=1,date_created already merged.

Although MySQL is perfectly capable of doing Index Merging, you should never expect the Query Optimizer to do index merging on indexes with very low cardinalities. Even if it did, just expect poor performance. Query Optimizers will choose full table scans, full index scans, and range scans over lopsided index merges any day. Creating covering indexes can bypass having to do index merges, but it can become a burden or a big waste of time and diskspace if the cardinality of the multiple-column index is still too low. Thus, the Query Optimizer would still choose not use it. Therefore, you must know your key distribution well in order to formulate multiple column indexes that you will write queries to use properly.

Here are some nice Links on Covering Indexes and When to Make Them

As far the date_published column goes, it would be alright to add it to the index. It would bring that much more relief to the Query Optimizer. It will perform an index scan to test for <=NOW() but that OK. That's will covering indexes are for.

Best Answer

Related Solutions

How does Oracle handle composite index lookups

Mysql – Combining columns in index

Related Question