Handling Long Sequences of Missing Values in PostgreSQL

countgaps-and-islandsnullpostgresqlwindow functions

I have a table like this:

create table foo (foo_label text, foo_price int, foo_date date);

insert into foo (
          values
          ('aaa', 100,  '2017-01-01'),
          ('aaa', NULL, '2017-02-01'),
          ('aaa', NULL, '2017-03-01'),
          ('aaa', NULL, '2017-04-01'),
          ('aaa', 140,  '2017-05-01'),
          ('aaa', NULL, '2017-06-01'),
          ('aaa', 180,  '2017-07-01')
        );

As you can see a few values on the foo_price column are missing.

What I need is that missing values get filled up with the "previous" available value in this way:

 foo_label | fixed_foo_price | foo_date
-----------+-----------------+------------
 aaa       | 100             | 2017-01-01
 aaa       | 100             | 2017-02-01
 aaa       | 100             | 2017-03-01
 aaa       | 100             | 2017-04-01
 aaa       | 140             | 2017-05-01
 aaa       | 140             | 2017-06-01
 aaa       | 180             | 2017-07-01

My attempt:

select 
    foo_label, 
    (case when foo_price is null then previous_foo_price else foo_price end) as fixed_foo_price,
    foo_date
from (
  select 
      foo_label, 
      lag(foo_price) OVER (PARTITION BY foo_label order by foo_date::date) as previous_foo_price, 
      foo_price,
      foo_date
      from foo
) T;

As you can see from here:

https://www.db-fiddle.com/#&togetherjs=s6giIonUxT

It doesn't fill completely the '100' series.

Any idea how can I get the wanted result?

Best Answer

I would form groups with the window function count() and then take the first value for each group:

SELECT foo_label
     , first_value(foo_price) OVER (PARTITION BY foo_label, grp ORDER BY foo_date) AS fixed_foo_price
     , foo_date
FROM  (
   SELECT foo_label
        , count(foo_price) OVER (PARTITION BY foo_label ORDER BY foo_date) AS grp
        , foo_price
        , foo_date
   FROM   foo
   ) sub;

This works because count() only counts non-null values. So all rows with NULL end up in the same group as the last row with an actual value. Exactly what you need.

Related Solutions

Sql-server – Easier way to handle so many isnull() situation

No, there is no way to tell SQL Server to treat all NULL float values as zero. You will have to surround these expressions with ISNULL() or, better yet IMHO, COALESCE(). You can do this in a view so you don't have to repeat it in every query.

SQL Server 2012 – Finding Missing Data Gaps in Large Tables

There is no need to generate dates.

The following query will give you a list of SHORTCODES with no rows at all:

select SHORTCODE from shortcodes
except
select SHORTCODE from VWTBL_INDICATOR

The following query will give you the continuous ranges of MonthYear per SHORTCODE.

select      SHORTCODE
            ,min(MonthYear) as from_MonthYear
            ,max(MonthYear) as to_MonthYear
            ,count(*)       as months

from       (SELECT   SHORTCODE
                    ,MonthYear
                    ,row_number() over (partition by SHORTCODE order by MonthYear)  as rn

            From     VWTBL_INDICATOR
            ) t

group by    SHORTCODE
            ,DATEADD(month,-rn,MonthYear)   

order by    SHORTCODE
            ,from_MonthYear

If you wish you can use the following version which has an additional layer of information:

missing_from_MonthYear + to_MonthYear: missing range in the middle
ranges: Number of ranges per SHORTCODE (ranges>1 means you have gaps in the middle)
range_seq: the sequential number of each SHORTCODE range
is_first: Indication for the first range per SHORTCODE (check from_MonthYear to see if you are missing preceding dates)
is_last: Indication for the last range per SHORTCODE (check to_MonthYear to see if you are missing following dates)

select      SHORTCODE
           ,from_MonthYear                                                                                  as exists_from_MonthYear
           ,to_MonthYear                                                                                    as exists_to_MonthYear
           ,dateadd (day,1,to_MonthYear)                                                                    as missing_from_MonthYear
           ,dateadd (day,-1,lead (from_MonthYear) over (partition by SHORTCODE order by from_MonthYear))    as missing_to_MonthYear
           ,count       (*) over (partition by SHORTCODE)                                                   as ranges
           ,row_number  ()  over (partition by SHORTCODE order by from_MonthYear)                           as range_seq
           ,case from_MonthYear when min(from_MonthYear) over (partition by SHORTCODE) then 1 end           as is_first
           ,case to_MonthYear   when max(to_MonthYear)   over (partition by SHORTCODE) then 1 end           as is_last

from       (select      SHORTCODE
                       ,min(MonthYear)  as from_MonthYear
                       ,max(MonthYear)  as to_MonthYear
                       ,count(*)        as months

            from       (SELECT      SHORTCODE
                                   ,MonthYear
                                   ,row_number() over (partition by SHORTCODE order by MonthYear)   as rn

                        From        VWTBL_INDICATOR
                        ) t

            group by    SHORTCODE
                       ,DATEADD(month,-rn,MonthYear)    
            ) t

order by    SHORTCODE
           ,from_MonthYear

Best Answer

Related Solutions

Sql-server – Easier way to handle so many isnull() situation

SQL Server 2012 – Finding Missing Data Gaps in Large Tables

Related Question