Here is your original query:
SELECT
user_actions.*,
users.twitter_username,
users.email
FROM
`user_actions`
INNER JOIN users
ON (user_actions.user_id=users.id)
ORDER BY timestamp
DESC LIMIT 0, 30
;
First thing I notice is that you are joining two whole tables. Since you only need twitter_username
and email
from the users
table, you should only join from users
using three columns : id
, twitter_username
and email
.
Second thing is the LIMIT
clause. It is being executed after the join. You should execute it before the join. In your case, you are requesting for the 30 most recent user actions. If you can guarantee that only 30 rows are retreived from user_actions
, the join should operate much faster.
If you read the answer from @DTest, his first two bulletpoints already tell you what's wrong the query because of the actions mysql will take in gathering data from each table. The key is to understand what the temp tables will look like while the query is being processed and where the data will reside (memory or disk).
What you need to do is refactor the query to fool the MySQL Query Optimizer. Force the query to produce smaller temp tables. In most cases, config changes in my.cnf should make a dramamtic difference. In other cases, such as this one, refactoring the query may be sufficient.
Here is my proposed change to your query that should work faster:
SELECT
ua.*,
u.twitter_username,
u.email
FROM
(SELECT * FROM `user_actions`
ORDER BY timestamp DESC LIMIT 30) ua
LEFT JOIN
(SELECT id,twitter_username,email FROM `users`) u
ON (ua.user_id=u.id)
;
Here are reasons for refactoring the query:
REASON #1
If you look at inline table ua
, I retrieve only 30 rows using LIMIT
. This will happen no matter how big the user_actions
table gets. It is already ordered because the ORDER BY timestamp DESC
happens before the LIMIT
.
REASON #2
If you look inline table u
, it has id
,twitter_username
,email
. The id
is needed to implement the join.
REASON #3
I use LEFT JOIN
instead of INNER JOIN
for two(2) reasons:
- Preserve the order of the query based on
ua
- Display all user actions in case the user_id in the
ua
no longer exists in the users
tables.
Doing these things will force the temp tables to be smaller. Nevertheless, you will still need implement bulletpoint #3 from @DTest's answer to preempt having temp tables land on disk.
It might be worth posting the table definition from your other question for clarity.
The composite index is doing a few things for you:
- As you know, enforcing uniqueness on (sensor_id, timestamp); I'm unsure whether this is an important data integrity constraint.
- Allowing queries that filter on both columns to look up matching rows by using a single index. MySQL can answer some queries (equality conditions on multiple columns are the ones I know about) by merging two indexes, but this tends to be significantly slower compared to using a single composite index.
- The index can also be used to search for values in a left-based subset of the composite index, but not a right-based subset. So in this case it could help a query that filters on sensor_id values or sensor_id and timestamp values, but not timestamp values alone.
There are a number of caveats to this, so it's good idea to look at the EXPLAIN
output for your queries and verify what indexes they're using. Keep also in mind that indexes can support the read part of UPDATE and DELETE queries, as well as JOINs, GROUP BY, ORDER BY, and other operations I'm neglecting.
An example of a scenario where the composite index is unnecessary would be if you don't care about the uniqueness constraint and all your queries filter on timestamp or sensor_id, but not both.
The single-column index on sensor_id
is actually redundant since the composite index on (sensor_id
, timestamp
) can be used by the same queries, but still you might find that some queries perform faster when doing scans on the single-column index compared to using a composite index with a wider key. The difference might not be enough to matter, though, and some testing will probably be required to find out.
In addition to looking at the EXPLAIN
output for your queries, tools such as pt-index-usage
from the Percona toolkit or the table INFORMATION_SCHEMA.INDEX_STATISTICS
if you're running Percona Server or MariaDB can help you assess what indexes are actually being used.
References:
The Optimization That (Often) Isn’t: Index Merge Intersection
Practical MySQL indexing guidelines
Best Answer
Indexes are in most cases a B-Tree structure (or some sort of). There are DBMS that supported different indexing types.
As you are talking about longitude/latitude PostgreSQL with it's GIST indexes come to mind. Oracle has Bitmap indexes in addition to the B-Tree indexes. I'm sure SQL Server and DB2 also have some special index types. And then there are full text indexes which make searching text very efficient.
A B-Tree index is very efficient in finding a specific value - think of a primary key index where all values are different. If the index only contains the PK column(s) (i.e. it is not a clustered index) then typically looking up a row by a specific PK values takes not more than (roughly) 3-4 IO operations (at least with Oracle). 2-3 to find the index block and an additional one to read the whole row. This gets more efficient if the index contains additional columns so that the lookup of the actual table row is not needed. The term for that is "covering index" or "index only retrieval".
Now for doing "range lookups" (e.g.
where foo > 42
) an index is very helpful as well as in most DBMS the index can also be scanned according to a predicate. Usually (again this highly depends on the DBMS) this is slightly less efficient than a direct lookup (again this also depends on the ability to do an "index only retrieval").I don't know of any BMS which can not use more than one index in a query. Think a join on a PK and a FK column - depending on the data distribution the DBMS might use the index to find the parent rows (PK lookup) and the child rows (FK lookup).
But not all DBMS can use more than one index for the same table in a single query.
After all whether or not an index is being used or not depends on a lot of things.
I can highly recommend http://use-the-index-luke.com/ which is a very good introduction on indexing across all major DBMS.
DBMS specific information:
Oracle: http://docs.oracle.com/cd/E11882_01/server.112/e25789/indexiot.htm
PostgreSQL: http://www.postgresql.org/docs/current/static/indexes.html