Mysql – How does compound index help speed up query

indexMySQLperformancequery-performance

Say we have a bunch of column in mysql table.

Latitude Longitude

Say we make an index (Latitude and Longitude)

Well, the index will sort by Latitude first and IF the Latitude is the same it'll sort by Longitude.

Something fishy should show up. Latitude is rarely, if ever, exactly the same.

Say we want to find all points whose Lat,Long is "in a box" How in the earth those indexes can be useful at all.

Note: I am trying to understand how indexing work. I am not trying to snobbishly say that compound index is stupid. Doesn't have to be full answer. Any pointing is fine.

More importantly, I am suspicious that a query will only use 1 index and the only way for a query to use multilpe index is to make compound index.

Best Answer

Indexes are in most cases a B-Tree structure (or some sort of). There are DBMS that supported different indexing types.

As you are talking about longitude/latitude PostgreSQL with it's GIST indexes come to mind. Oracle has Bitmap indexes in addition to the B-Tree indexes. I'm sure SQL Server and DB2 also have some special index types. And then there are full text indexes which make searching text very efficient.

A B-Tree index is very efficient in finding a specific value - think of a primary key index where all values are different. If the index only contains the PK column(s) (i.e. it is not a clustered index) then typically looking up a row by a specific PK values takes not more than (roughly) 3-4 IO operations (at least with Oracle). 2-3 to find the index block and an additional one to read the whole row. This gets more efficient if the index contains additional columns so that the lookup of the actual table row is not needed. The term for that is "covering index" or "index only retrieval".

Now for doing "range lookups" (e.g. where foo > 42) an index is very helpful as well as in most DBMS the index can also be scanned according to a predicate. Usually (again this highly depends on the DBMS) this is slightly less efficient than a direct lookup (again this also depends on the ability to do an "index only retrieval").

I don't know of any BMS which can not use more than one index in a query. Think a join on a PK and a FK column - depending on the data distribution the DBMS might use the index to find the parent rows (PK lookup) and the child rows (FK lookup).

But not all DBMS can use more than one index for the same table in a single query.

After all whether or not an index is being used or not depends on a lot of things.

I can highly recommend http://use-the-index-luke.com/ which is a very good introduction on indexing across all major DBMS.

DBMS specific information:

Oracle: http://docs.oracle.com/cd/E11882_01/server.112/e25789/indexiot.htm
PostgreSQL: http://www.postgresql.org/docs/current/static/indexes.html

Related Solutions

MySql – How to speed up this query

Here is your original query:

SELECT
    user_actions.*,
    users.twitter_username,
    users.email
FROM
    `user_actions`  
    INNER JOIN users
    ON (user_actions.user_id=users.id)
    ORDER BY timestamp
    DESC LIMIT 0, 30
;

First thing I notice is that you are joining two whole tables. Since you only need twitter_username and email from the users table, you should only join from users using three columns : id, twitter_username and email.

Second thing is the LIMIT clause. It is being executed after the join. You should execute it before the join. In your case, you are requesting for the 30 most recent user actions. If you can guarantee that only 30 rows are retreived from user_actions, the join should operate much faster.

If you read the answer from @DTest, his first two bulletpoints already tell you what's wrong the query because of the actions mysql will take in gathering data from each table. The key is to understand what the temp tables will look like while the query is being processed and where the data will reside (memory or disk).

What you need to do is refactor the query to fool the MySQL Query Optimizer. Force the query to produce smaller temp tables. In most cases, config changes in my.cnf should make a dramamtic difference. In other cases, such as this one, refactoring the query may be sufficient.

Here is my proposed change to your query that should work faster:

SELECT
    ua.*,
    u.twitter_username,
    u.email
FROM
    (SELECT * FROM `user_actions`
    ORDER BY timestamp DESC LIMIT 30) ua
    LEFT JOIN
    (SELECT id,twitter_username,email FROM `users`) u
    ON (ua.user_id=u.id)
;

Here are reasons for refactoring the query:

REASON #1

If you look at inline table ua, I retrieve only 30 rows using LIMIT. This will happen no matter how big the user_actions table gets. It is already ordered because the ORDER BY timestamp DESC happens before the LIMIT.

REASON #2

If you look inline table u, it has id,twitter_username,email. The id is needed to implement the join.

REASON #3

I use LEFT JOIN instead of INNER JOIN for two(2) reasons:

Preserve the order of the query based on ua
Display all user actions in case the user_id in the ua no longer exists in the users tables.

Doing these things will force the temp tables to be smaller. Nevertheless, you will still need implement bulletpoint #3 from @DTest's answer to preempt having temp tables land on disk.

Mysql – Is this compound index unnecessary

It might be worth posting the table definition from your other question for clarity.

The composite index is doing a few things for you:

As you know, enforcing uniqueness on (sensor_id, timestamp); I'm unsure whether this is an important data integrity constraint.
Allowing queries that filter on both columns to look up matching rows by using a single index. MySQL can answer some queries (equality conditions on multiple columns are the ones I know about) by merging two indexes, but this tends to be significantly slower compared to using a single composite index.
The index can also be used to search for values in a left-based subset of the composite index, but not a right-based subset. So in this case it could help a query that filters on sensor_id values or sensor_id and timestamp values, but not timestamp values alone.

There are a number of caveats to this, so it's good idea to look at the EXPLAIN output for your queries and verify what indexes they're using. Keep also in mind that indexes can support the read part of UPDATE and DELETE queries, as well as JOINs, GROUP BY, ORDER BY, and other operations I'm neglecting.

An example of a scenario where the composite index is unnecessary would be if you don't care about the uniqueness constraint and all your queries filter on timestamp or sensor_id, but not both.

The single-column index on sensor_id is actually redundant since the composite index on (sensor_id, timestamp) can be used by the same queries, but still you might find that some queries perform faster when doing scans on the single-column index compared to using a composite index with a wider key. The difference might not be enough to matter, though, and some testing will probably be required to find out.

In addition to looking at the EXPLAIN output for your queries, tools such as pt-index-usage from the Percona toolkit or the table INFORMATION_SCHEMA.INDEX_STATISTICS if you're running Percona Server or MariaDB can help you assess what indexes are actually being used.

References:

The Optimization That (Often) Isn’t: Index Merge Intersection

Practical MySQL indexing guidelines