Postgresql – Return previous running total when value is null in a time series

group byjoin;postgresqlset-returning-functionswindow functions

This is a continuation of a previous question, where table definition and sample data can be found. (Huge thanks to @Erwin Brandstetter for the help there).

All of this is being done on a PostgreSQL 11.5 DB.

SELECT *
FROM   (
   SELECT the_day::date
   FROM   generate_series(timestamp '2020-01-01', date_trunc('day', localtimestamp), interval '1 day') the_day
   ) d 
LEFT   JOIN (
   SELECT customer_id
        , created_at::date AS the_day
        , sum(sum(t.amount) FILTER (WHERE stock_ticker = 'tsla')) OVER w AS tsla_running_amount
        , sum(sum(t.amount) FILTER (WHERE stock_ticker = 'goog')) OVER w AS goog_running_amount
   FROM   transactions t
   WHERE  created_at >= timestamp '2020-01-01'
   GROUP  BY customer_id, created_at::date
   WINDOW w AS (PARTITION BY customer_id ORDER BY created_at::date)
   ) t USING (the_day)
ORDER  BY customer_id, the_day;

When a customer has NO transactions on a day, I need to be able to display the "most recent" running total for that customer for that day.

For example:

2020-01-01: Customer purchases 5 "tsla" for a total balance of 5.
2020-01-02: Customer takes no action. Current total balance display as 5.
2020-01-03: Customer sells 2 "tsla", total balance now 3.
2020-01-04: Customer takes no action. Current balance still display as 3.

Currently, the way that this is set up, the record for a customer on a day where they have no transactions will return as a NULL row for balance totals.

How can I display their "most recent" running balance on days when they have no transactions?

This query will be running for all customers that exist within the transactions table (have had a transaction at least once in the past).

Best Answer

One way is to generate one row per (customer_id, the_day) before running the window function. Like:

SELECT c.customer_id, d.the_day
     , sum(t.tsla_amount) OVER w AS tsla_running_amount
     , sum(t.goog_amount) OVER w AS goog_running_amount
FROM   (
   SELECT the_day::date
   FROM   generate_series(timestamp '2020-01-01'
                        , date_trunc('day', localtimestamp)
                        , interval '1 day') the_day
   ) d 
CROSS  JOIN (SELECT DISTINCT customer_id FROM transactions) c  -- !
LEFT   JOIN (
   SELECT customer_id
        , created_at::date AS the_day
        , sum(t.amount) FILTER (WHERE stock_ticker = 'tsla') AS tsla_amount
        , sum(t.amount) FILTER (WHERE stock_ticker = 'goog') AS goog_amount
   FROM   transactions t
   WHERE  created_at >= timestamp '2020-01-01'
   GROUP  BY customer_id, created_at::date
   ) t USING (customer_id, the_day)
WINDOW w AS (PARTITION BY customer_id ORDER BY the_day)
ORDER  BY customer_id, the_day;

db<>fiddle here

If the table is big, and there are only relatively few distinct customers, you might want to optimize performance. See:

Optimize GROUP BY query to retrieve latest row per user

But you would typically have a separate table of customers to work with ...

If you want 0 instead of NULL in rows before the first transaction of a customer, add COALESCE. Like:

SELECT c.customer_id, d.the_day
     , COALESCE(sum(t.tsla_amount) OVER w, 0) AS tsla_running_amount
     , COALESCE(sum(t.goog_amount) OVER w, 0) AS goog_running_amount
FROM ...

Related Solutions

Sql-server – Running total to the previous row

The answer is to use 1 PRECEDING, not CURRENT ROW -1. So, in your query, use:

    , SUM(s.OrderQty) OVER (PARTITION BY  SalesOrderID 
                            ORDER BY SalesOrderDetailID
                            ROWS BETWEEN UNBOUNDED PRECEDING 
                                     AND 1 PRECEDING) 
    AS  PreviousRunningTotal

Also note that on your other calculation:

    , SUM(s.OrderQty) OVER (PARTITION BY  SalesOrderID
                            ORDER BY SalesOrderDetailID) ...

SQL-Server uses the default ^* RANGE UNBOUNDED PRECEDING AND CURRENT ROW. I think that there is an efficiency difference and ROWS UNBOUNDED PRECEDING AND CURRENT ROW is to be preferred (after testing of course and if it gives the results you want).

Much more details you can find in the blog article by @Aaron Bertrand, including performance tests: Best approaches for running totals – updated for SQL Server 2012

^{* this is of course the default range when an ORDER BY is present inside the OVER clause - otherwise, without ORDER BY the default is the whole partition.}

Postgresql – Calculate column running total

Well, apparently this is it:

SELECT 
    *, 
    SUM(amount) OVER (
        ORDER BY tx_date DESC 
        ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    ) AS running_balance
FROM transactions
ORDER BY tx_date DESC;

Output:

 id |      memo       | amount |  tx_date   | running_balance 
----+-----------------+--------+------------+-----------------
  4 | Wages           |  75000 | 2015-08-31 |           77000
  2 | Eating out      |  -1000 | 2015-08-19 |            2000
  3 | Gas             |  -7000 | 2015-08-19 |            3000
  1 | Initial balance |  10000 | 2015-08-17 |           10000

Best Answer

Related Solutions

Sql-server – Running total to the previous row

Postgresql – Calculate column running total

Related Question