MySQL – Why is This Simple Join Taking So Long?

innodbmyisamMySQLperformance

I'm trying to run some reporting queries on a MySQL database, and I'm running into serious performance issues for some reason. One table, page_views, is extremely large (~70 million records), so some slowness is to be expected, but it's got lots of indexing, and all the queries are restricted to a certain date range on the started_at field. I'm seeing simple joins taking upwards of ten minutes. The query I've been tossing around lately (which I've stripped way down) looks like this:

select count(page_views.id) as views
from playpack_media
join page_views on page_views.itemId=playpack_media.media_Id
where playpack_media.playpack_id = 1
    and page_views.started_at BETWEEN '2014-06-23' and '2014-07-07'

It's taking about 11 minutes 30 seconds to run. DESCRIBE returns this analysis:

| id | select_type | table          | type | possible_keys        | key         | key_len | ref                                 | rows | Extra       |
+----+-------------+----------------+------+----------------------+-------------+---------+-------------------------------------+------+-------------+
|  1 | SIMPLE      | playpack_media | ref  | media_id,playpack_id | playpack_id | 4       | const                               |  594 |             |
|  1 | SIMPLE      | page_views     | ref  | itemId,started_at    | itemId      | 4       | db_playsmrt.playpack_media.media_id | 4236 | Using where |

It seems to bottleneck at the "copying to tmp table" stage. I've tried increasing tmp_table_size and max_heap_table_size to 256M and key_buffer_size to 64M – the latter change dropped the time from ~18 minutes to ~11, which is a big improvement but still terrible, and a further increase to 128M didn't do much.

One thing I noticed while hammmering at this is that page_views uses MyISAM and playpack_media uses InnoDB, which I'm reading can be an issue. However, changing page_views to InnoDB on my (much smaller) development server makes it dramatically slower. It's possible that it becomes worthwhile on larger tables, but just making the experiment would take a long time.

Does anyone have an idea what's going on here? What else should I be looking at?

Best Answer

Look at the original query

select count(page_views.id) as views
from playpack_media
join page_views on page_views.itemId=playpack_media.media_Id
where playpack_media.playpack_id = 1
    and page_views.started_at BETWEEN '2014-06-23' and '2014-07-07'

You should do a couple of things.

First, create to compound indexes

ALTER TABLE page_views     ADD INDEX started_at_itemid_ndx    (started_at,itemid);
ALTER TABLE playpack_media ADD INDEX playpack_id_media_Id_ndx (playpack_id,media_Id)

Try to refactor the query

SELECT COUNT(1) as views FROM
(
    SELECT media_Id FROM playpack_media
    WHERE playpack_id = 1
) A INNER JOIN
(
    SELECT itemId FROM page_views WHERE
    started_at BETWEEN '2014-06-23' and '2014-07-07'
) B ON A.media_Id = B.itemId;

Perhaps breaking down the views by ItemID

SELECT IFNULL(A.media_id,'Total') Media,COUNT(1) as views FROM
(
    SELECT media_Id FROM playpack_media
    WHERE playpack_id = 1
) A INNER JOIN
(
    SELECT itemId FROM page_views WHERE
    started_at BETWEEN '2014-06-23' and '2014-07-07'
) B ON A.media_Id = B.itemId
GROUP BY A.media_id WITH ROLLUP;