Let's say I have a table that has a record of every change a customer has made to their subscription for an internet service:
+-----+------------+-----------+------------+
| _id | account_id | plan_name | created_at |
+-----+------------+-----------+------------+
| 1 | a | basic | 10-01-19 |
| 2 | b | standard | 10-15-19 |
| 3 | c | free | 10-16-19 |
| 4 | a | standard | 10-16-19 |
| 5 | b | free | 10-20-19 |
+-----+------------+-----------+------------+
I'd like to perform an analysis on this data so that I can see the count of plans for every day, from 10-01-19 through 10-20-19.
To perform this analysis (among other things) I'd like to create a query that will give me a table with a row for every date for every user from their first recorded plan_name through present day (let's say this is 10-20-19). It might look something like this:
+-----+----------+------+-----------+
| _id | date | user | plan_name |
+-----+----------+------+-----------+
| 1 | 10-1-19 | a | basic |
| 2 | 10-2-19 | a | basic |
| 3 | 10-3-19 | a | basic |
| 4 | ... | ... | ... |
| 5 | 10-15-19 | a | standard |
| 6 | 10-16-19 | a | standard |
| 7 | 10-17-19 | a | standard |
| 8 | 10-15-19 | b | standard |
| 9 | 10-16-19 | b | standard |
| 10 | 10-17-19 | b | standard |
| 11 | ... | ... | ... |
| 12 | 10-20-19 | b | free |
+-----+----------+------+-----------+
My first test attempt at this was the following:
WITH dates AS (
SELECT (GETDATE()::DATE - ROW_NUMBER() OVER (ORDER BY TRUE))::date AS date
FROM pg_catalog.pg_operator
LIMIT 500)
SELECT date,
(case when account_id is not null then account_id else lag(account_id ignore nulls) over (order by date) end) as account_id,
(case when plan_name is not null then plan_name else lag(plan_name ignore nulls) over (order by date) end) as plan_name,
(case when start_date is not null then start_date else lag(start_date ignore nulls) over (order by date) end) as start_date
FROM dates
LEFT JOIN (
select * from customer_subscriptions
where account_id = '98asd7gf98a7') AS a
on dates.date = trunc(a.start_date)
ORDER BY date DESC
With the caveat that I tried it on a single account_id from my subscriptions table — and it worked! My thought process was generate a date range and LEFT JOIN the subscriptions table on it at the "change over" dates and then use a lag function to fill in the NULLs
However, when i remove the where account_id = '98asd7gf98a7'
it no longer works and I am left with a table that simply is a joining the start dates to a date from the range, without any of the lag filler in between.
Any thoughts on another way to approach this or changes I can make to my current approach?
Best Answer
Non-optimized solution:
fiddle