DB2 – Handling last_value Ignoring Nulls in LEFT JOIN with Additional Dimension

db2db2-luwwindow functions

This is just a silly example that I started to think about, but can't seem to come up with an elegant solution for. Assuming a Calendar table (I'll use Db2 syntax):

create table calendar
( dt date not null primary key
);

insert into calendar (dt)
with t (dt) as ( values cast('2020-01-01' as date)
                 union all
                 select dt + 1 day from t where dt < '2020-01-11')
select dt from t                 
;

and a table that holds the balance:

create table balance 
( dt date not null
, amount int not null
,     primary key (dt)
);

insert into balance (dt, amount)
values ('2020-01-03',100) ,('2020-01-05', -50);

If we want to copy the last known balance, we can use LAST_VALUE and 'IGNORE NULLS' like:

select c.dt, last_value(b.amount, 'IGNORE NULLS') over (order by c.dt)
from calendar c
left join balance b
    on c.dt = b.dt;

However, if we add a dimension, say cid (customer_id), it is no longer obvious what last_value means. Normally we would partition by cid, but cid is lost due to the left join:

create table balance1 
( cid int not null
, dt date not null
, amount int not null
, primary key (cid, dt)
);

insert into balance1 (cid, dt, amount)
values (1, '2020-01-03',100) ,(1, '2020-01-05', -50)
     , (2, '2020-01-04',75), (2, '2020-01-08',55), (2, '2020-01-10', -35);


select c.dt, last_value(b.amount, 'IGNORE NULLS') over (partition by ? 
                                                        order by c.dt)
from calendar c
left join balance b
    on c.dt = b.dt;

The best I could come up with was to use a cartesian product between calendar and distinct customers from balances:

select cid, dt, last_value(amount, 'IGNORE NULLS') over (partition by cid order by dt)
from (
  select cid, dt, amount from balance1
  union
  select distinct b.cid, c.dt, null 
  from balance1 b
  cross join calendar c
  where not exists (
    select 1 from balance1 
    where dt = c.dt and cid = b.cid
  )
) t  
order by dt, cid
;

Not that pretty, and I'm looking for a more elegant solution. I used Db2 syntax above and in this Fiddle but it's the principle I'm after so any vendor syntax will do.

Best Answer

Why not this?

SELECT def.cid, c.dt, LAST_VALUE(b.amount, 'IGNORE NULLS') OVER(PARTITION BY def.cid ORDER BY c.dt)
  FROM calendar c
    INNER JOIN (SELECT DISTINCT cid FROM balance1) AS def(cid) ON 1=1
    LEFT JOIN balance1 b ON (c.dt, def.cid) = (b.dt, b.cid)
  ORDER BY c.dt, def.cid
;

Comparison in dbfiddle