MySQL Performance – Troubleshooting Slow SQL Query with LEFT JOIN

MySQLperformancequery-performance

What my query is doing:

I have two SQL tables and I want to copy datarows from the one table to the other table. I am doing this for statistics. The source table includes all current member-sessions/logins and with my statistics table I also want to see member-logins of the past. Thus I want to store these data into my bot_sessions_statistics table as well to make sure they are not being updated anymore:

The original query:

INSERT INTO bot_sessions_statistics (member_id, session_token, username, ip_address)
SELECT sessions.member_id, sessions.session_token, sessions.username, sessions.ip_address
FROM bot_sessions AS sessions
LEFT JOIN
(
    SELECT stats.member_id, stats.session_token, stats.username, stats.ip_address
    FROM bot_sessions_statistics as stats
    WHERE date_active >= date(NOW())
) AS stats
ON sessions.member_id = stats.member_id
WHERE latest_renewal >= date(NOW())
AND stats.member_id IS NULL

The performance of the original query:

Query_time: 86.364613  Lock_time: 0.000085 Rows_sent: 0  Rows_examined: 1088312551

The table structures:

Table bot_sessions:
Table bot_sessions

Table bot_sessions_statistics:
Table bot_sessions_statistics

SHOW CREATE of both tables: http://pastebin.com/UUzEPX5v

The performance of single queries:

SELECT sessions.member_id, sessions.session_token, sessions.username, sessions.ip_address
FROM bot_sessions AS sessions
WHERE latest_renewal >= date(NOW())

Returns 44.2k rows (Duration: 0.078s / Fetch: 1.607s)

SELECT stats.member_id, stats.session_token, stats.username, stats.ip_address
FROM bot_sessions_statistics as stats
WHERE date_active >= date(NOW())

Returns 23.3k rows (Duration: 0.047s / Fetch 0.920s)

Best Answer

As a general rule, run (and post the results here) an EXPLAIN on your query to see if indexes are being used or not. At first, I see there are no other indexes on the sessions table except PKs, this will almost surely make your query run slow, since the WHERE will cause the RDBMS do a full table scan to find the rows.
I guess your query will at least require an index on latest_renewal (but, again, check with EXPLAIN).

This is my understanding of your query, on which I based my simplification:

SELECT sessions.member_id
    ,sessions.session_token
    ,sessions.username
    ,sessions.ip_address
FROM bot_sessions AS sessions
LEFT JOIN ( /* */
    SELECT stats.member_id
        ,stats.session_token
        ,stats.username
        ,stats.ip_address
    FROM bot_sessions_statistics AS stats
    WHERE date_active >= DATE (NOW()) /* a. I think this gives all of 'today' statistics */
    ) AS stats ON sessions.member_id = stats.member_id /* b. you're joining on 'member_id' which is NOT NULL in sessions, so any row from this subquery will have a value for (stats.)member_id */
WHERE latest_renewal >= DATE (NOW())
    AND stats.member_id IS NULL /* c. this will filter out all the rows from the JOIN, since no stats.member_id can be NULL */

Next, at first sight (if I'm not mistaken) I think your SELECT query can be rewritten without the LEFT JOIN, like this:

SELECT sessions.member_id, sessions.session_token, sessions.username, sessions.ip_address
FROM bot_sessions AS sessions
WHERE 
    NOT EXISTS (
        SELECT 1 FROM bot_sessions_statistics
        WHERE date_active >= date(NOW())
    )
AND latest_renewal >= date(NOW())

If the above is really correct, compare the results of EXPLAIN on both.