Given two tables:
CREATE TABLE foo (ts timestamp, foo text);
CREATE TABLE bar (ts timestamp, bar text);
I wish to write a query that returns values for ts
, foo
, and bar
that represents a unified view of the most recent values. In other words, if foo
contained:
ts | foo
--------
1 | A
7 | B
and bar
contained:
ts | bar
--------
3 | C
5 | D
9 | E
I want a query that returns:
ts | foo | bar
--------------
1 | A | null
3 | A | C
5 | A | D
7 | B | D
9 | B | E
If both tables have an event at the same time, the order does not matter.
I have been able to create the structure needed using union all and dummy values:
SELECT ts, foo, null as bar FROM foo
UNION ALL SELECT ts, null as foo, bar FROM bar
which will give me a linear timeline of new values, but I'm not quite able to work out how to populate the null values based on the previous rows. I've tried the lag
window function, but AFAICT it will only look at the previous row, not recursively backward. I've looked at recursive CTEs, but I'm not quite sure how to set up the start and termination conditions.
Best Answer
Use a
FULL [OUTER] JOIN
, combined with two rounds of window functions:Since
count()
does not count NULL values it conveniently only increases with every non-null value, thereby forming groups that will share the same value. In the outerSELECT
,min()
(ormax()
) likewise ignores NULL values, thereby picking the one non-null value per group. Voilá.Related
FULL JOIN
case:It's one of those cases where a procedural solution might just be faster, since it can get the job done in a single scan. Like this plpgsql function:
Call:
db<>fiddle here, demonstrating both.
Old sqlfiddle.
Related answer explaining the
#variable_conflict use_column
: