Mysql – Combining columns in index

indexMySQLperformancequery-performance

With reference to RolandoMySQLDBA's answer on

MySQL: Index when joining to tables not being used (Performance optimizing question)

He mentioned

Make sure you build an index that involves the columns deleted, visible, and date_created as follows:

ALTER TABLE blogposts ADD INDEX deleted_visible_date_created (deleted,visible,date_created);

May I ask, what is the benefit of combining columns in index? And why have you not included date_published in the index, which was also part of WHERE clause?

Should we create indexes of all columns used in WHERE clause or not?

Best Answer

What has to be adhered to is the query you know you are going to make.

Let go back to the original question's query

SELECT /* [things omitted] */ articles.blogpost_id, articles.id AS articleid
FROM blogposts
JOIN articles ON articles.blogpost_id = blogposts.id
WHERE blogposts.deleted = 0
AND blogposts.title LIKE '%{de}%'
AND blogposts.visible = 1
AND blogposts.date_published <= NOW()
ORDER BY blogposts.date_created DESC
LIMIT 0 , 50

Look carefully at the WHERE clause. I see two static values

  • blogposts.deleted = 0
  • blogposts.visible = 1

These cannot be indexes by themselves because of their implied cardinality. Think about it:

  • deleted is either 0 or 1 (That's a Cardinality of 2)
  • visible is either 0 or 1 (That's a Cardinality of 2)

All Query Optimizers (MySQL, Oracle, PostgreSQL, MSSQL, etc) would take one look at those indexes and decide not to use the index.

Look at the ORDER BY clause : ORDER BY blogposts.date_created DESC. This shows a predictable ordering schema. This may help bypass the need for sorting.

Combining these three columns into a single index gives the Query Optimizer some relief by gathering data with delete=0,visible=1,date_created already merged.

Although MySQL is perfectly capable of doing Index Merging, you should never expect the Query Optimizer to do index merging on indexes with very low cardinalities. Even if it did, just expect poor performance. Query Optimizers will choose full table scans, full index scans, and range scans over lopsided index merges any day. Creating covering indexes can bypass having to do index merges, but it can become a burden or a big waste of time and diskspace if the cardinality of the multiple-column index is still too low. Thus, the Query Optimizer would still choose not use it. Therefore, you must know your key distribution well in order to formulate multiple column indexes that you will write queries to use properly.

Here are some nice Links on Covering Indexes and When to Make Them

As far the date_published column goes, it would be alright to add it to the index. It would bring that much more relief to the Query Optimizer. It will perform an index scan to test for <=NOW() but that OK. That's will covering indexes are for.