MySQL – How to Remove Using Temporary in Join Optimization

join;MySQLoptimization

I have what seems like a simple query, and my indexes seem to be sufficient without the INNER JOIN, but after I add the join, I end up seeing "Using Temporary". Would also appreciate any good resources for learning more about query optimization!

Query:

SELECT SQL_CALC_FOUND_ROWS DISTINCT m.id 
FROM messages m 
INNER JOIN message_entries t2
    ON m.id = t2.message_id
WHERE m.user_id = 1234 
    AND m.type = 1 
ORDER BY m.sent_on DESC;

The messages table has a covering index for (user_id, type, sent_on, created) and it uses this index for the query, and the message_entries table has an index on message_id (through foreign key).

I've tried adding id to the covering index on the messages table (at the beginning or end of the index, i've tried both), but it still chooses to use the original index for user_id, type, sent_on, created.

What can I do to improve performance when adding joins to the base query?

Best Answer

Step 1. Get rid of the JOIN.

The main purpose of t2 (the way the query is written) is to see if there are amy message_entries for the given message. Do you really need that test? If not, get rid of the JOIN

Step 2. Get rid of the DISTINCT.

The secondary purpose of the JOIN (should you decide to keep it) is to create multiple rows, thereby necessitating the DISTINCT. But, since it is only testing for the existence of at least one 'entry', toss the JOIN and add this to the WHERE:

AND EXISTS ( SELECT 1 FROM message_entries WHERE m.id = message_id )

Step 3: Better INDEX (Ok, you already have a good index)

INDEX(user_id, type, sent_on)

This may allow for getting through the WHERE and on to the ORDER BY. Note: The ORDER BY caused the temp and filesort.

Note: A temp and/or filesort is not evil in itself; it is just a sign of either a complex query (where it is necessary) or a missing index or an inefficiently formulated query.

Huh? Since you don't have a LIMIT, why do you have SQL_CALC_FOUND_ROWS? Count the number of rows in the resultset to get the "found rows".

Related Solutions

Mysql – Help optimizing MySQL slow query

I would like to get rid of "Using temporary; Using filesort"

One of the problems I see is that you're using different GROUP BY and ORDER BY clauses. From the manual on how MySQL uses temporary tables:

If there is an ORDER BY clause and a different GROUP BY clause, or if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue, a temporary table is created.

As soon as you create a temporary table, it will need to be sorted according to your ORDER BY clause, indicated by 'using filesort'.

This execution plan at leasts uses the indexes to appropriately limit the number of rows found.

I would also look through the docs on ORDER BY optimization.

Postgresql – Really slow DISTINCT ON query with multiple joins

Because the WHERE condition of the query involves only equality checks:

WHERE "songs"."processed" = 't' 
  AND "songs"."working" = 't'

and then you have:

SELECT  DISTINCT ON (songs.rank, songs.shared_id) ...

which is similar to GROUP BY songs.rank, songs.shared_id

I would first try adding a compound index on (first the columns in WHERE, then the columns in DISTINCT ON):

(processed, working, rank, shared_id)

The ordering: ORDER BY rank DESC may be better optimized if you have the index as:

(processed, working, rank DESC, shared_id)

Not really sure if this would contribute to efficiency but you can test.

Addition by @Erwin

^{As per request in comment}

In principal (default) b-tree indexes can be scanned forward and backward at the same speed. But sorting can make a difference in multi-column indexes where you combine the sort order of multiple columns. The query starts with:

SELECT  DISTINCT ON (songs.rank, songs.shared_id)

In combination with ORDER BY rank DESC this dictates that the result be ordered by rank DESC, shared_id effectively. After the (simplified) WHERE clause WHERE processed AND working has been applied and before LIMIT can be applied.
I have my doubts if the DISTINCT clause is actually useful. But while it is there, the optimal index for the query should be (just as @ypercube suspected):

CREATE INDEX songs_special_idx
ON songs (processed, working, rank DESC, shared_id);

Looks like one of the rare cases where explicit ordering of index columns would benefit the query. There is an excellent explanation in the chapter Indexes and ORDER BY of the manual.

If the WHERE condition is stable (always WHERE processed AND working), a partial multi-column index would be smaller and faster, yet:

CREATE INDEX songs_special_idx
ON songs (rank DESC, shared_id)
WHERE processed AND working;

Best Answer

Related Solutions

Mysql – Help optimizing MySQL slow query

Postgresql – Really slow DISTINCT ON query with multiple joins

Addition by @Erwin

Related Question