Mysql – Does it ever make sense to create an index with additional columns after the primary key

indexMySQLperformance

I am looking through our MySQL 5.7.16 database and I noticed that there are some indexes that use the primary key as the first column in the index, followed by additional columns. Intuitively, this doesn't seem to make sense to me since the index would always narrow it down to one row by using the first column. However, I am not sure if there are any special circumstances in which it makes sense to have this kind of index. Is there any situation where this would improve performance?

Best Answer

Indexes have more than one use. The primary use (in most cases) is to quickly identify the rows that meet a query's conditions. When doing this, the leftmost columns of the index are considered.

Say you have a table, users, with 100 columns, a mix of dates, strings, and numbers. Five of the columns in the table are: id, username, country, acct_expir_date, and access_level (an int). id is the primary key, both it and username are unique.

Let's say there's a (non-clustered) index on country and username. The query:

SELECT `username`, `acct_expir_date` FROM `users` WHERE country = 'UNITED STATES';

could use this index to identify all users from that country, locate their records in the actual table, and return the requested data.

However, the query:

SELECT `username`, `country` FROM `users` WHERE `username` LIKE 'g%';

would not be able to use the index - the leftmost column in the index, country, is not a part of the WHERE clause.

That said, indexes do have a second use. If all the columns in a query exist in an index, then the query can ignore the actual table, and use the index as if it were the table itself. this is called a covering index.

Let's assume we now have another non-clustered index on our table, on the columns acct_expir_date, username, id, and access_level, in that order. And, we have the query:

SELECT `username`, `access_level`, `acct_expir_date`
  FROM `users`
 WHERE `acct_expir_date` >= '2017-07-01'
   AND `acct_expir_date` < '2017-08-01'
;

This query would probably use our index even if the SELECT column list was * (all columns); however, since our index contains all the columns in the query, the users table itself will not be touched - the query will simply use the information in the index to provide the requested data.

So, in your case, the columns in the index up to the primary key are there to be used for the index's first use - locating records. The remaining columns may be there for the second use; providing a covering index.

NOTE: as stated by Balazs Papp in the comments, columns to the right of the primary key in the index can be used to help identify specific rows in a query where a range of primary key values are of interest; however, each individual row in the index would have to be checked, since the rows cannot be sorted on anything to the right of the primary key (since it's unique). Since the index is already in memory, narrowing down the actual selection of rows further via the index would indeed probably be faster than loading the full records and then narrowing things down; however, if the extra columns are going to be used in searches regularly, they might be more useful to the left of any unique columns, where they can narrow things down more quickly.

UPDATE: tables that can be loaded into memory in their entirety don't generally need covering indexes. If your table is not all that large, then any performance gain from the indexes may be lost in maintaining them and storing them. Consider the size of the table overall, the size of the rows of the table vs. the rows of the index, and how often queries are performed that might use the index, vs. how often the data in those columns is added or updated. If the indexes seem like they might be causing more harm than good, you may want to consider dropping them. However, do so with great care; if there is some critical query they're boosting performance on, removing them could cause significant issues. I would remove no more than one per month or so; over the course of a month of normal business, whatever use the index might have should be encountered. And, only do so with the knowledge of key people in your department; not just your boss, but someone who deals with reports of performance issues, so any problems that may arise can be resolved. Also, keep in mind how long it would take to recreate the index; there may actually be cases where creating the index, using it for one monthly report, then dropping it would still give a performance benefit, without the maintenance and (long-term) storage costs.

UPDATE 2: For the sake of (improved) completeness, as joanolo mentions in the comments, indexes also have a third general use - they do sort the values in indexes columns by those columns. If that sort matches your ORDER BY clause, the engine can avoid having to sort the records by retrieving them in the necessary order to begin with. This isn't particularly relevant to the OP's question, since a unique column means that any values to the right in the index aren't being sorted (because the position of the rows is fixed once the index reaches a unique column), but it is true.

Best Answer

Related Solutions

Postgresql table with one integer column, sorted index, with duplicate primary key

Sql-server – Non-Clustered Index key columns to cover a variable where clause

Related Question