Postgresql – Return previous running total when value is null in a time series

group byjoin;postgresqlset-returning-functionswindow functions

This is a continuation of a previous question, where table definition and sample data can be found. (Huge thanks to @Erwin Brandstetter for the help there).

All of this is being done on a PostgreSQL 11.5 DB.

SELECT *
FROM   (
   SELECT the_day::date
   FROM   generate_series(timestamp '2020-01-01', date_trunc('day', localtimestamp), interval '1 day') the_day
   ) d 
LEFT   JOIN (
   SELECT customer_id
        , created_at::date AS the_day
        , sum(sum(t.amount) FILTER (WHERE stock_ticker = 'tsla')) OVER w AS tsla_running_amount
        , sum(sum(t.amount) FILTER (WHERE stock_ticker = 'goog')) OVER w AS goog_running_amount
   FROM   transactions t
   WHERE  created_at >= timestamp '2020-01-01'
   GROUP  BY customer_id, created_at::date
   WINDOW w AS (PARTITION BY customer_id ORDER BY created_at::date)
   ) t USING (the_day)
ORDER  BY customer_id, the_day;

When a customer has NO transactions on a day, I need to be able to display the "most recent" running total for that customer for that day.

For example:

2020-01-01: Customer purchases 5 "tsla" for a total balance of 5.
2020-01-02: Customer takes no action. Current total balance display as 5.
2020-01-03: Customer sells 2 "tsla", total balance now 3.
2020-01-04: Customer takes no action. Current balance still display as 3.

Currently, the way that this is set up, the record for a customer on a day where they have no transactions will return as a NULL row for balance totals.

How can I display their "most recent" running balance on days when they have no transactions?

This query will be running for all customers that exist within the transactions table (have had a transaction at least once in the past).

Best Answer

One way is to generate one row per (customer_id, the_day) before running the window function. Like:

SELECT c.customer_id, d.the_day
     , sum(t.tsla_amount) OVER w AS tsla_running_amount
     , sum(t.goog_amount) OVER w AS goog_running_amount
FROM   (
   SELECT the_day::date
   FROM   generate_series(timestamp '2020-01-01'
                        , date_trunc('day', localtimestamp)
                        , interval '1 day') the_day
   ) d 
CROSS  JOIN (SELECT DISTINCT customer_id FROM transactions) c  -- !
LEFT   JOIN (
   SELECT customer_id
        , created_at::date AS the_day
        , sum(t.amount) FILTER (WHERE stock_ticker = 'tsla') AS tsla_amount
        , sum(t.amount) FILTER (WHERE stock_ticker = 'goog') AS goog_amount
   FROM   transactions t
   WHERE  created_at >= timestamp '2020-01-01'
   GROUP  BY customer_id, created_at::date
   ) t USING (customer_id, the_day)
WINDOW w AS (PARTITION BY customer_id ORDER BY the_day)
ORDER  BY customer_id, the_day;

db<>fiddle here

If the table is big, and there are only relatively few distinct customers, you might want to optimize performance. See:

But you would typically have a separate table of customers to work with ...

If you want 0 instead of NULL in rows before the first transaction of a customer, add COALESCE. Like:

SELECT c.customer_id, d.the_day
     , COALESCE(sum(t.tsla_amount) OVER w, 0) AS tsla_running_amount
     , COALESCE(sum(t.goog_amount) OVER w, 0) AS goog_running_amount
FROM ...