I would like to get rid of "Using temporary; Using filesort"
One of the problems I see is that you're using different GROUP BY
and ORDER BY
clauses. From the manual on how MySQL uses temporary tables:
If there is an ORDER BY clause and a different GROUP BY clause, or if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue, a temporary table is created.
As soon as you create a temporary table, it will need to be sorted according to your ORDER BY
clause, indicated by 'using filesort'.
This execution plan at leasts uses the indexes to appropriately limit the number of rows found.
I would also look through the docs on ORDER BY optimization.
Here is your original query:
SELECT
user_actions.*,
users.twitter_username,
users.email
FROM
`user_actions`
INNER JOIN users
ON (user_actions.user_id=users.id)
ORDER BY timestamp
DESC LIMIT 0, 30
;
First thing I notice is that you are joining two whole tables. Since you only need twitter_username
and email
from the users
table, you should only join from users
using three columns : id
, twitter_username
and email
.
Second thing is the LIMIT
clause. It is being executed after the join. You should execute it before the join. In your case, you are requesting for the 30 most recent user actions. If you can guarantee that only 30 rows are retreived from user_actions
, the join should operate much faster.
If you read the answer from @DTest, his first two bulletpoints already tell you what's wrong the query because of the actions mysql will take in gathering data from each table. The key is to understand what the temp tables will look like while the query is being processed and where the data will reside (memory or disk).
What you need to do is refactor the query to fool the MySQL Query Optimizer. Force the query to produce smaller temp tables. In most cases, config changes in my.cnf should make a dramamtic difference. In other cases, such as this one, refactoring the query may be sufficient.
Here is my proposed change to your query that should work faster:
SELECT
ua.*,
u.twitter_username,
u.email
FROM
(SELECT * FROM `user_actions`
ORDER BY timestamp DESC LIMIT 30) ua
LEFT JOIN
(SELECT id,twitter_username,email FROM `users`) u
ON (ua.user_id=u.id)
;
Here are reasons for refactoring the query:
REASON #1
If you look at inline table ua
, I retrieve only 30 rows using LIMIT
. This will happen no matter how big the user_actions
table gets. It is already ordered because the ORDER BY timestamp DESC
happens before the LIMIT
.
REASON #2
If you look inline table u
, it has id
,twitter_username
,email
. The id
is needed to implement the join.
REASON #3
I use LEFT JOIN
instead of INNER JOIN
for two(2) reasons:
- Preserve the order of the query based on
ua
- Display all user actions in case the user_id in the
ua
no longer exists in the users
tables.
Doing these things will force the temp tables to be smaller. Nevertheless, you will still need implement bulletpoint #3 from @DTest's answer to preempt having temp tables land on disk.
Best Answer
What are you trying to achieve here? I am not a MySQL programmer; however, as I understand SQL in general, every column that is not listed in the GROUP BY clause must be part of an aggregate like
SELECT *
in a grouping query will not work in most (if not all) SQL implementations.UPDATE
In order to get the row with the highest id for each group you would have to embed the query above in an "outer" query.
This query is deterministic and should work with most SQL dialects.
Some query engines perform better with joins than with "IN subquery". You can give this a try