MySQL – How to Calculate Growth Rate

MySQLmysql-5.5

I'm trying to calculate purchase/profit growth rate in different time periods, This is the query i've come up with (View in SQLFiddle):

Schema:

CREATE TABLE `profit_log` (
  `market_id` smallint(6) NOT NULL,
  `total_purchase` bigint(20) NOT NULL,
  `total_profit` bigint(20) NOT NULL,
  `company_profit` bigint(20) NOT NULL,
  `date` date NOT NULL,
  `tid` bigint(20) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci;

Query:

SELECT a.`market_id`,
    SUM(a.`total_purchase`) as acc_purchase,
    SUM(a.`total_profit`) as acc_profit,
    SUM(a.`company_profit`) as acc_company_profit,
    ((b.`total_purchase` - c.`total_purchase`)/ c.`total_purchase`) * 100 as growth,
    ((d.`total_purchase` - e.`total_purchase`)/ e.`total_purchase`) * 100 as growth_week,
    ((f.`total_purchase` - g.`total_purchase`)/ g.`total_purchase`) * 100 as growth_month,
    ((d.`total_profit` - e.`total_profit`)/ e.`total_profit`) * 100 as growth_week_profit,
    ((f.`total_profit` - g.`total_profit`)/ g.`total_profit`) * 100 as growth_month_profit
FROM `profit_log` as a
JOIN `profit_log` as b on a.`market_id` = b.`market_id`
    AND b.`date` = '2016-05-05'
JOIN `profit_log` as c on a.`market_id` = c.`market_id`
    AND c.`date` = '2016-05-01'
JOIN `profit_log` as d on a.`market_id` = d.`market_id`
    AND d.`date` >= ('2016-05-05' - INTERVAL 7 DAY) AND d.`date` < ('2016-05-05' + INTERVAL 1 DAY)
JOIN `profit_log` as e on a.`market_id` = e.`market_id`
    AND e.`date` >= ('2016-05-05' - INTERVAL 14 DAY) AND e.`date` < ('2016-05-05' - INTERVAL 7 DAY)
JOIN `profit_log` as f on a.`market_id` = f.`market_id`
    AND f.`date` >= ('2016-05-05' - INTERVAL 1 MONTH) and f.`date` < ('2016-05-05' + INTERVAL 1 DAY)
JOIN `profit_log` as g on a.`market_id` = g.`market_id`
    AND g.`date` >= ('2016-05-05' - INTERVAL 2 MONTH) AND g.`date` < ('2016-05-05' - INTERVAL 1 MONTH)
WHERE a.`date` >= '2016-05-01'
AND a.`date` < ('2016-05-05' + INTERVAL 1 DAY)
AND a.`market_id` IN (0,20)
GROUP BY a.`market_id`

The Strange thing is after all this JOINs, SUMs (i.e SUM(a.total_purchase)) return unbelievably large numbers, it seems that values are accumulated over all this joins.

Can you please explain why is that happening (maybe i don't understand JOIN behavior), and help me to rewrite this query the proper way (6 self joins doesn't look right).

Thanks.

Best Answer

Another way you can do this is by summing each criterion by a CASE statement:

SELECT market_id,
       SUM(CASE WHEN a = 1 THEN total_purchase END) AS acc_purchase,
       SUM(CASE WHEN a = 1 THEN total_profit END) AS acc_profit,
       SUM(CASE WHEN a = 1 THEN company_profit END) AS acc_company_profit,
       ((SUM(CASE WHEN b = 1 THEN total_purchase END) - SUM(CASE WHEN c = 1 THEN total_purchase END)) / SUM(CASE WHEN c = 1 THEN total_purchase END)) * 100 AS growth,
       ((SUM(CASE WHEN p = 'd' THEN total_purchase END) - SUM(CASE WHEN p = 'e' THEN total_purchase END)) / SUM(CASE WHEN p = 'e' THEN total_purchase END)) * 100 AS growth_week,
       ((SUM(CASE WHEN p = 'f' THEN total_purchase END) - SUM(CASE WHEN p = 'g' THEN total_purchase END)) / SUM(CASE WHEN p = 'g' THEN total_purchase END)) * 100 AS growth_month,
       ((SUM(CASE WHEN p = 'd' THEN total_profit END) - SUM(CASE WHEN p = 'e' THEN total_profit END)) / SUM(CASE WHEN p = 'e' THEN total_profit END)) * 100 AS growth_week_profit,
       ((SUM(CASE WHEN p = 'f' THEN total_profit END) - SUM(CASE WHEN p = 'g' THEN total_profit END)) / SUM(CASE WHEN p = 'g' THEN total_profit END)) * 100 AS growth_month_profit
FROM (
    SELECT profit_log.*,
           CASE WHEN date >= @f THEN 1 END AS a,
           CASE WHEN date = @d THEN 1 END AS b,
           CASE WHEN date = @f THEN 1 END AS c,
           CASE WHEN date >= @d - INTERVAL 7 DAY THEN 'd'
                WHEN date >= @d - INTERVAL 14 DAY THEN 'e'
                WHEN date >= @d - INTERVAL 1 MONTH THEN 'f'
                WHEN date >= @d - INTERVAL 2 MONTH THEN 'g' END AS p
    FROM (SELECT @d := DATE'2016-05-05', @f := @d - INTERVAL DAY(@d)-1 DAY) AS params,
         profit_log
    WHERE date BETWEEN @d - INTERVAL 2 MONTH AND @d
          AND market_id IN (0, 20)
  ) a
GROUP BY market_id

In the case above, @d is the anchor date, and @f is the first day of the month in @d. The calculated columns a, b and c denote entries valid (value 1) since the start of the month, for the anchor day only, and for the first day of the month only, respectively. The calculated column p has values d, e, f and g, for last 7 days, last 14 days, last month, and last two months, respectively. Notice that values a, b, and c can overlap with each other and with d, e, and f (thus needing their individual flags), but values d, e, f and g never overlap, so are folded in a single column.

With these values (performed primarily for readability of the outer SELECT statement), all the values are SUMs of values conforming to some criteria (either a, b, c, d, e, f, or g). If none of the values overlapped, the query could be simplified significantly.