Mysql – Slow subquery in SELECT statement

MySQLoptimizationsubquery

Having tried many different methods, I can't get my query taking shorter than 40 seconds to run with a dataset of ~11000 rows.

The table is a list of hotel bookings, containing (most relevant) Check-In date and Booking Value

My boss wanted a diagram where the values are summed for every month/year in a special way:

Row 1: Sum of value between Jan 2012 - Jan 2012
Row 2: Sum of value between Jan 2012 - Feb 2012
Row 3: Sum of value between Jan 2012 - Mar 2012
Row 4: Sum of value between Jan 2012 - Apr 2012
...
Row 5: Sum of value between Jan 2012 - Dec 2012

And so on for every year. Basically row 5 in this example would return the summed value for the entire year.

Table:

CREATE TABLE IF NOT EXISTS `bookings` (
  `DBID` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `checkin` date NOT NULL,
  `checkout` date NOT NULL,
  `total` int(11) NOT NULL,
  PRIMARY KEY (`DBID`,`checkin`),
  KEY `total` (`total`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=11171 ;

My query:

SELECT
    YEAR(checkin) as year,
    CONCAT(UCASE(LEFT(MONTHNAME(checkin), 1)),SUBSTRING(MONTHNAME(checkin), 2)) as month,
    MONTH(checkin) as monthnum,
    (
        SELECT SUM(total)
        FROM bookings as book2
        WHERE YEAR(book2.checkin) = year AND MONTH(book2.checkin) <= monthnum
    ) as sum
FROM `bookings`
GROUP BY YEAR(checkin),MONTH(checkin)
ORDER BY YEAR(checkin) ASC, MONTH(checkin) ASC

So the subquery is the magic behind this query, it all produces the exact result I want, but the execution time is unacceptable.

I'm not sure how to optimise this further, I can't use joins since it's on the same table, I've indexed everything neccessary, etc.

Best Answer

Try rewriting the sub-query as a join:

SELECT YEAR(b1.checkin) as year
     , MONTH(b1.checkin) as monthnum,
     , SUM(b2.total)
FROM bookings as b1
JOIN bookings as b2
    ON YEAR(b2.checkin) = YEAR(b1.checkin) 
   AND MONTH(b2.checkin) <= MONTH(b1.monthnum)
GROUP BY YEAR(b1.checkin),MONTH(b1.checkin)
ORDER BY YEAR(b2.checkin) ASC, MONTH(b2.checkin) ASC

I also removed CONCAT(UCASE(LEFT(MONTHNAME(checkin), 1)),SUBSTRING(MONTHNAME(checkin), 2)) as month, I don't think that will matter but it might be beneficial to do this once, and add it to the report in the application-layer.

You don't mention version of MySQL, but later versions have generated columns. You can try generating columns for YEAR and MONTH:

CREATE TABLE IF NOT EXISTS bookings 
( DBID int unsigned NOT NULL AUTO_INCREMENT
, checkin date NOT NULL
, checkout date NOT NULL
, total int NOT NULL

, checkin_year SMALLINT as (YEAR(checkin))
, checkin_month SMALLINT as (MONTH(checkin)) 

,    PRIMARY KEY (DBID, checkin)
,    KEY total (total)

,    KEY (checkin_year, checkin_month)

) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=11171 ;

and then use those columns in the query:

SELECT b1.checkin_year
     , b1.checkin_month,
     , SUM(b2.total)
FROM bookings as b1
JOIN bookings as b2
    ON b2.checkin_year = b1.checkin_year 
   AND b2.checkin_month <= b1.checkin_month
GROUP BY b1.checkin_year, b1.checkin_month
ORDER BY b1.checkin_year, b1.checkin_month

BTW, what is the purpose of the index on TOTAL?