MySQL – How to Remove Using Temporary in Join Optimization

join;MySQLoptimization

I have what seems like a simple query, and my indexes seem to be sufficient without the INNER JOIN, but after I add the join, I end up seeing "Using Temporary". Would also appreciate any good resources for learning more about query optimization!

Query:

SELECT SQL_CALC_FOUND_ROWS DISTINCT m.id 
FROM messages m 
INNER JOIN message_entries t2
    ON m.id = t2.message_id
WHERE m.user_id = 1234 
    AND m.type = 1 
ORDER BY m.sent_on DESC;

The messages table has a covering index for (user_id, type, sent_on, created) and it uses this index for the query, and the message_entries table has an index on message_id (through foreign key).

I've tried adding id to the covering index on the messages table (at the beginning or end of the index, i've tried both), but it still chooses to use the original index for user_id, type, sent_on, created.

What can I do to improve performance when adding joins to the base query?

Best Answer

Step 1. Get rid of the JOIN.

The main purpose of t2 (the way the query is written) is to see if there are amy message_entries for the given message. Do you really need that test? If not, get rid of the JOIN

Step 2. Get rid of the DISTINCT.

The secondary purpose of the JOIN (should you decide to keep it) is to create multiple rows, thereby necessitating the DISTINCT. But, since it is only testing for the existence of at least one 'entry', toss the JOIN and add this to the WHERE:

AND EXISTS ( SELECT 1 FROM message_entries WHERE m.id = message_id )

Step 3: Better INDEX (Ok, you already have a good index)

INDEX(user_id, type, sent_on)

This may allow for getting through the WHERE and on to the ORDER BY. Note: The ORDER BY caused the temp and filesort.

Note: A temp and/or filesort is not evil in itself; it is just a sign of either a complex query (where it is necessary) or a missing index or an inefficiently formulated query.

Huh? Since you don't have a LIMIT, why do you have SQL_CALC_FOUND_ROWS? Count the number of rows in the resultset to get the "found rows".