Handling Long Sequences of Missing Values in PostgreSQL

countgaps-and-islandsnullpostgresqlwindow functions

I have a table like this:

create table foo (foo_label text, foo_price int, foo_date date);

insert into foo (
          values
          ('aaa', 100,  '2017-01-01'),
          ('aaa', NULL, '2017-02-01'),
          ('aaa', NULL, '2017-03-01'),
          ('aaa', NULL, '2017-04-01'),
          ('aaa', 140,  '2017-05-01'),
          ('aaa', NULL, '2017-06-01'),
          ('aaa', 180,  '2017-07-01')
        );

As you can see a few values on the foo_price column are missing.

What I need is that missing values get filled up with the "previous" available value in this way:

 foo_label | fixed_foo_price | foo_date
-----------+-----------------+------------
 aaa       | 100             | 2017-01-01
 aaa       | 100             | 2017-02-01
 aaa       | 100             | 2017-03-01
 aaa       | 100             | 2017-04-01
 aaa       | 140             | 2017-05-01
 aaa       | 140             | 2017-06-01
 aaa       | 180             | 2017-07-01

My attempt:

select 
    foo_label, 
    (case when foo_price is null then previous_foo_price else foo_price end) as fixed_foo_price,
    foo_date
from (
  select 
      foo_label, 
      lag(foo_price) OVER (PARTITION BY foo_label order by foo_date::date) as previous_foo_price, 
      foo_price,
      foo_date
      from foo
) T;

As you can see from here:

https://www.db-fiddle.com/#&togetherjs=s6giIonUxT

It doesn't fill completely the '100' series.

Any idea how can I get the wanted result?

Best Answer

I would form groups with the window function count() and then take the first value for each group:

SELECT foo_label
     , first_value(foo_price) OVER (PARTITION BY foo_label, grp ORDER BY foo_date) AS fixed_foo_price
     , foo_date
FROM  (
   SELECT foo_label
        , count(foo_price) OVER (PARTITION BY foo_label ORDER BY foo_date) AS grp
        , foo_price
        , foo_date
   FROM   foo
   ) sub;

This works because count() only counts non-null values. So all rows with NULL end up in the same group as the last row with an actual value. Exactly what you need.