Mysql – How to make the SQL query more efficient

MySQL

I use mysql under Linux. My table is about 8000000 rows. The table has index(advertiser_id,activity_date),(activity_date,advertiser_id,activity_type).
The table is looks as following:

activity_date        advertiser_id activity_type message
2014-06-01 00:00:16  80149         11            Raised max bid               
2014-06-01 00:00:16  80149         12            Set Bid Management 
2014-06-01 00:01:49  80149         11            Raised max bid               
2014-06-01 00:01:49  80149         12            Set Bid Management 
2014-06-01 00:03:17  37            12            Lowered daily budget              
2014-06-01 00:03:17  59            11            Lowered max bid                           
2014-06-01 00:03:17  59            14            Raised daily budget

Each row is an activity. I try to write SQL for each time one aid of activity_type:11(UPDATE_AD_BID), count the all other actions of this aid within one day. if one aid have a activity_type:11(UPDATE_AD_BID) at 2014-03-14 16:21:02, I need all the activity of this aid from 2014-03-13 16:21:02 to 2014-03-14 16:21:02. The SQL query I wrote is as following

SELECT bid_change.advertiser_id,bid_change.activity_date,
    CASE bid_change.activity_type WHEN 11 THEN IF(bid_change.message Like '%Raised%', 1, -1) ELSE 0 END as bid,
    sum(CASE change_type.activity_type WHEN 11 THEN 1 ELSE 0 END) as UPDATE_AD_BID,
    sum(CASE change_type.activity_type WHEN 12 THEN 1 ELSE 0 END) as UPDATE_AD_BUDGET,
    sum(CASE change_type.activity_type WHEN 14 THEN 1 ELSE 0 END) as CREATIVE,
FROM (select * from post_funnel) as bid_change  
LEFT JOIN (select * from post_funnel) as change_type 
ON bid_change.advertiser_id = change_type.advertiser_id 
AND change_type.activity_date BETWEEN bid_change.activity_date - INTERVAL 1 DAY AND bid_change.activity_date -INTERVAL 1 SECOND
WHERE bid_change.activity_type = 11
GROUP BY bid_change.advertiser_id, bid_change.activity_date;

The SQL work fine with small test data set. But when I use it with large data set. It took forever to run. Is there anyone I can make the SQL more efficient? or I need to look for a big data solution with Hadoop or so?

update:
output should be as following

activity_date advertiser_id bid  UPDATE_AD_BID UPDATE_AD_BUDGET CREATIVE
2014-06-01 00:00:16  80149  1    0             0                0       
2014-06-01 00:01:49  80149  1    1             1                0 
2014-06-01 00:03:17  59     -1   0             0                0

Best Answer

Some notes:

  • No need for the derived tables. Most, if not all MySQL versions will try to materialize the subresults of the derived tables and the subsequent join operations will not be able to use any indexes. Check the EXPLAIN output to verify this. Moreover, if the resulting tables do not fit in memory, they will be saved temporarily into disk, causing more delay.

  • I think that two indexes, on (activity_type, advertiser_id, activity_date) and on (advertiser_id, activity_date, activity_type) will be more useful than those you have. The first will be used for the bid_change and the second for joining the post_funnel.
    If the message column is not some huge varchar or text datatype, including it in the 1st (only) above index will further improve efficiency.

  • I strongly suggest you don't use BETWEEN for date and datetime comparisons but inclusive-exclusive ranges (qt >= start_datetime AND qt < end_datetime).

  • (detail) No need to use CASE bid_change.activity_type WHEN 11 in the SELECT when you have WHERE bid_change.activity_type = 11 in the query.

  • (detail) You can rewrite the other CASE using either COUNT() instead of SUM() or using the SUM(condition) mysql trick which makes them a bit simpler:

In all, the query can be rewritten:

SELECT 
    bid_change.advertiser_id,
    bid_change.activity_date,
    IF(bid_change.message LIKE '%Raised%', 1, -1) AS bid,
    SUM(change_type.activity_type = 11)           AS update_ad_bid,
    SUM(change_type.activity_type = 12)           AS update_ad_budget,
    SUM(change_type.activity_type = 14)           AS creative
FROM 
    post_funnel AS bid_change  
  LEFT JOIN
    post_funnel AS change_type 
      ON  change_type.advertiser_id = bid_change.advertiser_id
      AND change_type.activity_date >= bid_change.activity_date - INTERVAL 1 DAY 
      AND change_type.activity_date < bid_change.activity_date
WHERE 
    bid_change.activity_type = 11
GROUP BY 
    bid_change.advertiser_id, 
    bid_change.activity_date ;