I have 2 tables, a users table and a log table which literally logs each time a user signs in.
What I'm trying to do is grab a list of all users with a certain rank who have not logged in for over a year. That said, I have a working query, but it's taking ~5 minutes to execute.
Here's what I'm currently working with:
SQLFiddle
SELECT u.usrid, u.username, u.rank, u.extras, l.dtime
FROM users AS u
JOIN (
SELECT userid, MAX(datetime) dtime
FROM log
GROUP BY userid
) AS l
ON u.usrid = l.userid
WHERE (u.rank = 'P' OR u.extras LIKE '%W%')
AND l.dtime < DATE_SUB(NOW(), INTERVAL 1 YEAR)
ORDER BY l.dtime DESC
It's a part of a very old and soon to be updated system… But until then, I'm trying to make the best of what I've been given to work with. That being said, I don't have access to actually alter the database schema so I'm hoping there is a way to optimize this query to work better.
Furthermore, the users table has ~500k rows while the log table has ~4m rows. With that many rows, I'm also wondering if it's a size issue. If so, what suggestions could I pass on to the DBA in order to improve it's ability to scale? I know I could simply store the last log in date somewhere separate for this particular use-case; but we use that log data for a lot of things so we need a way to be able to search it efficiently.
Best Answer
Plus
(I assume
usrid
is thePRIMARY KEY
inusers
.)Is size an issue? Yes and no. With the right indexes and queries, etc, billion-row tables can work fine. In your case, it is dubious... The subquery will probably produce 500K distinct users from its
GROUP BY
. Then theHAVING
will shrink the number. What's left needs to be looked up inusers
and further filtered. Finally another sort (ORDER BY
) before delivering the results.I think that newer versions of MySQL will leapfrog through the
INDEX
I suggested, landing on only 500K index rows, not all 4M.If you think the
rank
andextras
do a better filtering thanMAX(datetime)
, then a different approach might be faster.