Mysql – How to one calculate percentage from previous week in aggregate query

MySQLperformancequeryquery-performance

I have a challenge I set out to do that seemed initially trivial. Not so for my developper brain.

Consider the following simple view, used to validate a cron I wrote that queries a subset of 200 000 statements every saturday.

It goes as follows:

mysql> SELECT
    ->     DATE_FORMAT(s.created, "%Y-%m-%d") as "Date",
    ->     count(s.id) AS "Accounts credited",
    ->     sum(s.withdrawal) "Total Credited"
    ->   --  100 * (sum(s.withdrawal) - sum(prev.withdrawal)) 
         --   / sum(prev.withdrawal) "Difference in %"
    ->     FROM statements s
    -> --    LEFT JOIN prev
    -> --        s.created - interval 7 DAY
    ->  --     ON prev.created = s.created - interval 7 DAY 
        --     AND (prev.status_id = 'OPEN' 
        --     OR prev.status_id = 'PENDING')
    ->     WHERE (s.status_id = 'OPEN' OR  s.status_id = 'PENDING')
    ->     GROUP BY YEAR(s.created), MONTH(s.created), DAY(s.created)
    ->       ORDER BY s.created DESC
    ->     LIMIT 8;

+------------+-------------------+----------------+
| Date       | Accounts credited | Total Credited |
+------------+-------------------+----------------+
| 2019-01-19 |             18175 |        3173.68 |
| 2019-01-12 |             18135 |        4768.43 |
| 2019-01-05 |             17588 |        6968.49 |
| 2018-12-29 |             17893 |        5404.18 |
| 2018-12-22 |             17353 |        7048.18 |
| 2018-12-15 |             16893 |        7181.34 |
| 2018-12-08 |             16220 |        9547.09 |
| 2018-12-01 |             15476 |        7699.59 |
+------------+-------------------+----------------+
8 rows in set (0.79 sec)

As is, the query is efficient and practical. I merely would like to add a column, difference in percentage, from previous week's total, as seen with the — commented out code.

I have tried various approaches, but because of the GROUP BY, adding an inline column to get the sum(withdrawal) of previous week makes the query run … forever.

I then tried the LEFT JOIN approach, but this has the same problem, Obviously. I think the added JOIN has to fetch the sum of previous week for every row of the outer select.

I then had the (not so smart) idea of querying my view, even but then it seems I would have the same issue.

I assume there are much more optimal approaches out there to this simple task.

Is there an elegant way to calculate a percentage from such a query?

Would a stored procedure or some other 'non-plain-sql' approach be more optimal?

Best Answer

When you use MySQL >= 8.0, you can use CTEs (common table expressions)

 ;
WITH 
    (
        SELECT  DATE(s.created) as "Date",
                COUNT(s.id) AS "Accounts credited",
                SUM(s.withdrawal) "Total Credited"
            FROM  statements s
            WHERE  s.status_id IN ('OPEN', 'PENDING')
            GROUP BY  DATE(s.created)
            ORDER BY  s.created DESC 
    ) AS cte 
SELECT  cte.*,
        100 * (cte.`Total Credited` - prev.`Total Credited`) /
              prev.`Total Credited`  AS "Difference in %"
    FROM  cte
    LEFT JOIN  cte prev  ON cte.`Date` = prev.`Date` - INTERVAL 7 DAY;

If you use an earlier version, just replace the CTE with appropriate subqueries, like

SELECT <your_fancy_percentage_calculation> 
FROM (<cte from above>) a 
LEFT JOIN (<cte from above>) b ON a.Date = b.Date - INTERVAL 7 DAY;

MySQL is smart enough to not execute the same query twice, I think.

Or you create a view and use this.