I have a balances
table in PostgreSQL 9.3 that looks like this:
CREATE TABLE balances (
user_id INT
, balance INT
, as_of_date DATE
);
INSERT INTO balances (user_id, balance, as_of_date) VALUES
(1, 100, '2016-01-03')
, (1, 50, '2016-01-02')
, (1, 10, '2016-01-01')
, (2, 200, '2016-01-01')
, (3, 30, '2016-01-03');
It only contains balances for dates that a user has made a transaction. I need it to contain a row for each user with their balance on each date in a given date range.
- If the user does not have a row for a given date in the range, I need to use their balance from the day before.
- If the user created their account after a given date in the range, I need to avoid creating a row for that user/date combination.
I can reference an accounts
table to get users' create_date
:
CREATE TABLE accounts (
user_id INT
, create_date DATE
);
INSERT INTO accounts (user_id, create_date) VALUES
(1, '2015-12-01')
, (2, '2015-12-31')
, (3, '2016-01-03');
My desired result looks like this:
+---------+---------+--------------------------+
| user_id | balance | as_of_date |
+---------+---------+--------------------------+
| 1 | 100 | 2016-01-03T00:00:00.000Z |
| 1 | 50 | 2016-01-02T00:00:00.000Z |
| 1 | 10 | 2016-01-01T00:00:00.000Z |
| 2 | 200 | 2016-01-03T00:00:00.000Z |
| 2 | 200 | 2016-01-02T00:00:00.000Z |
| 2 | 200 | 2016-01-01T00:00:00.000Z |
| 3 | 30 | 2016-01-03T00:00:00.000Z |
+---------+---------+--------------------------+
Note that rows have been added for user 2 for 2016-01-02
and 2016-01-03
, carrying over the previous balance from 2016-01-01
; and that no rows have been added for user 3, who was created on 2016-01-03
.
To generate a series of dates in a date range, I know I can use:
SELECT d.date FROM GENERATE_SERIES('2016-01-01', '2016-01-03', '1 day'::INTERVAL) d
…but I am struggling with LEFT JOIN
ing that series with each set of rows grouped by user_id
.
Best Answer
1.
CROSS JOIN
,LEFT JOIN LATERAL
to subqueryReturns your desired result - except that
as_of_date
is an actualdate
, not atimestamp
like in your example. That should be more appropriate.Users that are created already, but don't have any transactions, yet, are listed with a balance of 0. You did not define how to deal with the corner case.
Rather use
timestamp
input forgenerate_series()
:It's crucial for performance that you back this up with a multicolumn index:
We have had a very similar case on SO just this week:
Find more explanation there.
2.
CROSS JOIN
,LEFT JOIN
, window functionsSame result. If you have the multicolumn index mentioned above and can get index-only scans out of it, the first solution is most probably faster.
The main feature is the running count of values to form groups. since count() does not count NULL values, all dates without balance fall into the same group (
grp
) as the most recent balance. Then use a simplemax()
over the same window frame extended bygrp
to copy the last balance for dangling gaps.Related: