Mysql – 4m Rows in MySQL, ~5 Minutes to Run Query

MySQLoptimizationperformance

I have 2 tables, a users table and a log table which literally logs each time a user signs in.

What I'm trying to do is grab a list of all users with a certain rank who have not logged in for over a year. That said, I have a working query, but it's taking ~5 minutes to execute.

Here's what I'm currently working with:
SQLFiddle

SELECT u.usrid, u.username, u.rank, u.extras, l.dtime 
FROM users AS u
JOIN (
    SELECT userid, MAX(datetime) dtime 
    FROM log 
    GROUP BY userid
) AS l
ON u.usrid = l.userid 
WHERE (u.rank = 'P' OR u.extras LIKE '%W%')
AND l.dtime < DATE_SUB(NOW(), INTERVAL 1 YEAR)
ORDER BY l.dtime DESC

It's a part of a very old and soon to be updated system… But until then, I'm trying to make the best of what I've been given to work with. That being said, I don't have access to actually alter the database schema so I'm hoping there is a way to optimize this query to work better.

Furthermore, the users table has ~500k rows while the log table has ~4m rows. With that many rows, I'm also wondering if it's a size issue. If so, what suggestions could I pass on to the DBA in order to improve it's ability to scale? I know I could simply store the last log in date somewhere separate for this particular use-case; but we use that log data for a lot of things so we need a way to be able to search it efficiently.

Best Answer

SELECT  u.usrid, u.username, u.rank, u.extras, x.dtime
    FROM  
      ( SELECT  userid, MAX(datetime) dtime
            FROM  log
            GROUP BY  userid
            HAVING  dtime < NOW() - INTERVAL 1 YEAR 
      ) AS x
    JOIN  users u ON u.usrid = x.userid
    WHERE  ( u.rank = 'P'
              OR  u.extras LIKE '%W%' 
           )
    ORDER BY  x.dtime DESC 

Plus

 INDEX(usrid, datetime) -- in `log`

(I assume usrid is the PRIMARY KEY in users.)

Is size an issue? Yes and no. With the right indexes and queries, etc, billion-row tables can work fine. In your case, it is dubious... The subquery will probably produce 500K distinct users from its GROUP BY. Then the HAVING will shrink the number. What's left needs to be looked up in users and further filtered. Finally another sort (ORDER BY) before delivering the results.

I think that newer versions of MySQL will leapfrog through the INDEX I suggested, landing on only 500K index rows, not all 4M.

If you think the rank and extras do a better filtering than MAX(datetime), then a different approach might be faster.