Mysql – Left join on a varchar column

MySQLperformancequery-performance

I know this is probably something stupid on my part, but please bear with me. I have basically 3 tables (let's simplify things):

users
user_id | account_id | phone | name | …

accounts
account_id | name | …

phonedetails
phone | country_code | …

Let's pretend we have a simple query like this:

SELECT u.*, a.* FROM users u
LEFT JOIN accounts a ON a.account_id = u.account_id
ORDER BY u.user_id DESC
LIMIT 100

This correctly assesses that the query needs to use eq_ref on account_id and properly selects only 100 records and is almost instant. However, when I want to do this (this is a simplified version of what I want to use):

SELECT u.*, p.* FROM users u
LEFT JOIN phonedetails p ON p.phone = u.phone
ORDER BY u.user_id DESC
LIMIT 100

It suddenly tries to go through Full Table Scan (so it's about 300 000 records at the moment) and then does its job. Sure, it works, but it takes almost 4 seconds for whatever reason!

account_id => type int in both tables (users, accounts)
phone => varchar(24) in both tables (users, phonedetails)

It doesn't seem to matter if I have an index on that column or not.
Funny thing is that if I do something like this:

SELECT u.*, (SELECT p.country_code FROM phonedetails p WHERE p.phone = u.phone)
FROM users u
ORDER BY u.user_id DESC
LIMIT 100

It correctly joins the tables using eq_ref and it's fast.

Is there something I can do to improve on the query performance?

Edit with more info:

Interesting (at least for me), is that if I execute this query:

SELECT u.* FROM users u
LEFT JOIN phonedetails p ON p.phone = u.phone
ORDER BY u.user_id DESC
LIMIT 100

Explain shows those tables are joined correctly with eq_ref on the primary key and it's fast again. Of course, it's useless for me as I don't get the columns I need.

Best Answer

OK, I'll stop trying to figure out why your query is not working and give you a technique for solving it...

SELECT  u.*, p.*
    FROM  
       ( SELECT  user_id   -- minimal data (just PK in this case)
            FROM  users
            ORDER BY  user_id
            LIMIT  100    -- now in subquery
        ) AS x
    JOIN  users AS u  ON u.user_id = x.user_id
    JOIN  phonedetails AS p  ON p.phone = u.phone
    ORDER BY  x.user_id   -- yes, need extra ORDER BY

The general principle: When doing a LIMIT or GROUP BY and the query is even somewhat complex, it may be better to build a subquery that identifies which rows as simply as possible. Then do the JOIN, etc, to get the rest of the data.

  1. The subquery (called a "derived" table in this syntax) finds the user_ids -- and only the 100 needed. Since you are doing LEFT, there is no need to involve phonedetails in this query.
  2. Then reach out to the other tables to get the desired columns.

Caveat: This version of the query may be slow for the same yet-to-be-discovered reason. This reformulation is effective in some situations.