Mysql – CROSS JOIN of list of dates and another table

MySQL

I have a query for getting data from a certain table by date range and grouping by week. My CROSS JOIN intends to fill in a default value for each week where there are no results for the date range.

I can then execute this query.

    SELECT 
        SUM(invoice.amount) AS "invoice.amount", 
        CONCAT(DATE_FORMAT(invoice.updated_at, '%b %d'), ' - ', DATE_FORMAT(DATE_ADD(invoice.updated_at, INTERVAL 7 DAY), '%b %d')) AS "invoice.updated_at"
    FROM invoice
    CROSS JOIN (
        SELECT selected_date 
        FROM (
            SELECT ADDDATE('1970-01-01',t4.i*10000 + t3.i*1000 + t2.i*100 + t1.i*10 + t0.i) selected_date 
            FROM
                (SELECT 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0,
                (SELECT 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1,
                (SELECT 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2,
                (SELECT 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3,
                (SELECT 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t4
        ) v
        WHERE selected_date BETWEEN '2018-01-01' AND '2018-01-31'
        GROUP BY selected_date, YEAR(selected_date), WEEK(selected_date)
    ) calendar
    WHERE invoice.deleted_at IS NULL
        AND invoice.updated_at >= '2018-01-01'
        AND invoice.updated_at <= '2018-01-31'
        AND invoice.status = "PAID"
    GROUP BY calendar.selected_date, invoice.id, invoice.amount, YEAR(invoice.updated_at), WEEK(invoice.updated_at)

Assume I have these records in the database:

+----+------------+------------+------------+
| id |   amount   |   status   | updated_at | 
+----+------------+------------+------------+
|  1 |    1000    |    PAID    | 2018-01-01 |
|  2 |    2000    |    PAID    | 2018-01-01 |
|  3 |    100     |    PAID    | 2018-01-07 |
|  4 |    50      |    PAID    | 2018-01-11 |
+----+------------+------------+------------+

I expect to see these results, one record for every week of January:

+--------+-------------------+
| amount |     updated_at    | 
+--------+-------------------+
|  3100  |  Jan 1  - Jan 7   |
|  50    |  Jan 8  - Jan 15  |
|  0     |  Jan 16 - Jan 22  |
|  0     |  Jan 23 - Jan 30  |
|  0     |  Jan 31 - Jan 31  |
+--------+-------------------+

However, I get like 50 of these random duplicated results, the least of which contains the joined filler weeks since there are no 0 amounts:

+--------+----------------+
| amount |   updated_at   | 
+--------+----------------+
|  1000  | Jan 1 - Jan 7  |
|  2000  | Jan 1 - Jan 7  |
|  100   | Jan 1 - Jan 7  |
|  50    | Jan 8 - Jan 15 |
|  1000  | Jan 1 - Jan 7  |
|  1000  | Jan 1 - Jan 7  |
|  2000  | Jan 1 - Jan 7  |
|  2000  | Jan 1 - Jan 7  |
|  100   | Jan 1 - Jan 7  |
|  50    | Jan 8 - Jan 15 |
|  100   | Jan 1 - Jan 7  |
|  50    | Jan 8 - Jan 15 |
|  ...   | ...            |
|  ...   | ...            |
|  ...   | ...            |
+--------+----------------+

What gives?

Update:

Relevant pieces of the DDL

CREATE TABLE invoice (
    id     INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
    status VARCHAR(255) NOT NULL INDEX,
    amount INT(10) UNSIGNED NOT NULL,
    --
    created_at TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP INDEX,
    deleted_at TIMESTAMP NULL DEFAULT NULL,
    --
    PRIMARY KEY (id)
);

Update 2:

This is my data:

+--------+--------+---------------------+
| amount | status | updated_at          |
+--------+--------+---------------------+
|    100 | PAID   | 2018-12-23 23:05:02 |
|    100 | PAID   | 2018-12-26 18:00:04 |
|    150 | PAID   | 2018-12-27 08:43:03 |
|    200 | PAID   | 2018-12-29 08:30:03 |
+--------+--------+---------------------+

This is my full query (no redactions):

    SELECT 
        SUM(invoice.amount) AS "invoice.amount", 
        CONCAT(DATE_FORMAT(invoice.updated_at, '%b %d'), ' - ', DATE_FORMAT(DATE_ADD(invoice.updated_at, INTERVAL 7 DAY), '%b %d')) AS "invoice.updated_at"
    FROM invoice
    JOIN account_invoice
        ON account_invoice.account_id = 1
        AND account_invoice.invoice_id = invoice.id
    CROSS JOIN (
        SELECT selected_date 
        FROM (
            SELECT ADDDATE('1970-01-01', t4.i*10000 + t3.i*1000 + t2.i*100 + t1.i*10 + t0.i) selected_date 
            FROM
                (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t0,
                (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t1,
                (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t2,
                (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t3,
                (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t4
        ) v
        WHERE v.selected_date >= "2018-12-05 00:00:00" AND v.selected_date < "2019-01-04 23:59:59"
        GROUP BY FLOOR(TO_DAYS(v.selected_date) - 2) / 7
    ) calendar
    WHERE invoice.deleted_at IS NULL
        AND invoice.updated_at >= "2018-12-05 00:00:00"
        AND invoice.updated_at < "2019-01-04 23:59:59"
        AND invoice.status = "PAID"
    GROUP BY FLOOR(TO_DAYS(invoice.updated_at) - 2) / 7

My result set:

{{3000} {Dec 23 - Dec 30}}
{{3000} {Dec 26 - Jan 02}}
{{4500} {Dec 27 - Jan 03}}
{{6000} {Dec 29 - Jan 05}}

I expect to see these results instead:

{{0} {Dec 05 - Dec 11}}
{{0} {Dec 12 - Jan 18}}
{{100} {Dec 19 - Dec 25}}
{{350} {Dec 26 - Jan 01}}
{{0} {Jan 02 - Jan 04}}

My result set is not accurate. Those numbers are basically the sum of the numbers of the duplicates I was getting before, crunched down into just 4 rows. Also, the week dates are off; as you can see, they start at the first day of the first record found (Dec 23), and end at Jan 05, 1 day after the date range date. >.<

Update 3:

Running this query now:

    SELECT 
        SUM(invoice.amount) AS "invoice.amount", 
        CONCAT(DATE_FORMAT(invoice.updated_at, '%b %d'), ' - ', DATE_FORMAT(DATE_ADD(invoice.updated_at, INTERVAL 7 DAY), '%b %d')) AS "invoice.updated_at"
    FROM invoice
    JOIN account_invoice
        ON account_invoice.account_id = 1
        AND account_invoice.invoice_id = invoice.id
    JOIN (
        SELECT selected_date 
        FROM (
            SELECT ADDDATE('1970-01-01', t4.i*10000 + t3.i*1000 + t2.i*100 + t1.i*10 + t0.i) selected_date 
            FROM
                (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t0,
                (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t1,
                (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t2,
                (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t3,
                (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t4
        ) v
        WHERE v.selected_date >= "2018-12-05 00:00:00" AND v.selected_date < "2019-01-04 23:59:59"
    ) calendar
        ON calendar.selected_date = DATE(invoice.updated_at)
    WHERE invoice.deleted_at IS NULL
        AND invoice.updated_at >= "2018-12-05 00:00:00"
        AND invoice.updated_at < "2019-01-04 23:59:59"
        AND invoice.status = "PAID"
    GROUP BY FLOOR(TO_DAYS(invoice.updated_at) - 2) / 7

New reuslt set is getting better, looks like this:

{{100} {Dec 23 - Dec 30}}
{{100} {Dec 26 - Jan 02}}
{{150} {Dec 27 - Jan 03}}
{{200} {Dec 29 - Jan 05}}

So the dates are still off, and the SUM function doesn't seem to be working, but now we're getting better numbers at least.

Update 4:

Got my result set to work properly now, however, the start date is offset by 1 day, and the ending is not quite correct cause it overlaps the end date.

Here is my query:

    SELECT 
        IF(invoice.amount IS NULL, 0, SUM(invoice.amount)) AS "invoice.amount", 
        CONCAT(DATE_FORMAT(calendar.selected_date, '%b %e'), ' - ', DATE_FORMAT(DATE_ADD(calendar.selected_date, INTERVAL 6 DAY), '%b %e')) AS "invoice.updated_at"
    FROM (
        SELECT selected_date 
        FROM (
            SELECT ADDDATE('1970-01-01', t4.i*10000 + t3.i*1000 + t2.i*100 + t1.i*10 + t0.i) selected_date 
            FROM
                (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t0,
                (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t1,
                (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t2,
                (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t3,
                (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t4
        ) v
        WHERE v.selected_date >= ? AND v.selected_date < ?
    ) calendar
    LEFT JOIN invoice
        ON invoice.deleted_at IS NULL
        AND DATE(invoice.updated_at) = calendar.selected_date
        AND invoice.status = "PAID"
        AND EXISTS (SELECT 1 FROM account_invoice WHERE account_id = 1 AND invoice_id = invoice.id)
    GROUP BY YEAR(calendar.selected_date), WEEK(calendar.selected_date)

And here is my result:

{{0} {Dec 6 - Dec 12}}
{{0} {Dec 9 - Dec 15}}
{{0} {Dec 16 - Dec 22}}
{{550} {Dec 23 - Dec 29}}
{{0} {Dec 30 - Jan 5}}
{{0} {Jan 1 - Jan 7}}

Very close…but there are two problems:

  1. The list of results should start at the first day of the date range, but it starts one day after the first day
  2. The list of results should end at the last day of the date range, even if it hasn't been fully a week yet at the end. But it still adds an entire week (likely because of my GROUP CONCAT)

This is what I want to see:

{{0} {Dec 5 - Dec 11}}
{{0} {Dec 12 - Dec 18}}
{{0} {Dec 19 - Dec 25}}
{{550} {Dec 26 - Jan 1}}
{{0} {Jan 1 - Jan 4}}

Best Answer

First, a bug: When crossing a year boundary, YEAR + WEEK will split into two "weeks". That is easily fixed by revamping the grouping:

FLOOR(TO_DAYS(...)/7)

provides a "week number" that avoids the problem. (You probably need FLOOR(TO_DAYS(...) + n)/7), where n is picked to shift to Sunday or Monday, as needed.)

That expression would be used for the GROUP BY.

On to the bug... The only thing that should be in the GROUP BY is the "week", not the rest of the columns; they are causing the duplication.

More...

  • Get rid of the inner `GROUP BY.
  • Don't do a CROSS JOIN; join ONv.selected_date = DATE(invoice.updated_at)`

(See if that gets closer to what you want. There are several optimizations beyond that.)

  • Do the GROUP BY based on FLOOR(TO_DAYS(v.selected_date) - 2) / 7
  • Consider changing the GROUP_CONCAT to use MIN(v.selected_date) and MAX(v.selected_date) so that you get the week range, not random updated_at values.

Change

JOIN account_invoice
    ON account_invoice.account_id = 1
    AND account_invoice.invoice_id = invoice.id

to

AND EXISTS ( SELECT 1 FROM account_invoice
                      WHERE account_id = 1
                        AND invoice_id = invoice.id )