Postgresql – How to simplify a carry over values process

postgresql

I have created a "sample case" here for Postgres 10.0 (actually I am using AWS equivalent for 10.1):

https://www.db-fiddle.com/#&togetherjs=EKTJ6eQ62V

where you can find the table:

create table test(l text, v1 integer, v2 real, monthly_date date);

insert into test 
    values 
    ('a', 2, 1.3, '2001-01-01'),
    ('a', 1, 2.2, '2001-02-01'),
    ('a', 5, 6.2, '2001-04-01'),
    ('b', 3, 9.0, '2001-03-01');

The expected output is:

l   v1  v2  monthly_date
a   2   1.3 2001-01-01T00:00:00.000Z
a   1   2.2 2001-02-01T00:00:00.000Z
a   0   2.2 2001-03-01T00:00:00.000Z
a   5   6.2 2001-04-01T00:00:00.000Z
a   0   6.2 2001-05-01T00:00:00.000Z
a   0   6.2 2001-06-01T00:00:00.000Z
b   3   9   2001-03-01T00:00:00.000Z
b   0   9   2001-04-01T00:00:00.000Z
b   0   9   2001-05-01T00:00:00.000Z
b   0   9   2001-06-01T00:00:00.000Z

In a monthly time range that goes from '2001-01-01' to '2001-06-01', if there are missing following months, they get filled with the previous month values. The only difference is in the column 'v1' where the value for the missing month gets replaced with 0.

The query that I am using at the moment is:

WITH
    md AS (
      SELECT *,
      LEAD(monthly_date) OVER (PARTITION BY l ORDER BY monthly_date) AS next_date
      FROM test
    ),
    calendar AS (
      select interval_date::date 
      from generate_series('2001-01-01'::date, '2001-06-01'::date, '1 month'::interval) interval_date
    )
    select T.l, coalesce (m2.v1, 0) as v1, T.v2, T.interval_date as monthly_date
    from (
        SELECT m.l, m.v1, m.v2, c.interval_date
        FROM calendar c
        JOIN md m
            ON c.interval_date BETWEEN m.monthly_date AND
            (CASE WHEN m.next_date IS NULL THEN '2001-06-01' ELSE m.next_date - '1 month'::interval END)
    ) T
    LEFT JOIN md m2 ON m2.l = T.l AND m2.monthly_date = T.interval_date;

which works fine but it is using a "LEFT JOIN" to inject the value '0' for the column 'v1'. Is there a better (possibly more efficient) way to do that?

Best Answer

You should check it, I can't assure this is better in terms or efficiency.

with x as -- all possible combinations 
(
    select distinct l, m
    from   test tt
    join   lateral (select m from
                    generate_series((select min(monthly_date) from test where l=tt.l), 
                                     '2001-06-01'::date, '1 month'::interval) m) t1 on true
)
select    x.l, 
          coalesce(test.v1, 0) as v1,
          coalesce(test.v2, (select v2 -- only executed for missing rows
                             from test 
                             where l = x.l and monthly_date < x.m 
                             order by l, monthly_date 
                             desc limit 1)) as v2,
          m as monthly_date
from      x
left join test
on        test.l = x.l
and       date_trunc('month', monthly_date) = date_trunc('month', m)
order by  x.l, x.m;


l  | v1 | v2  | monthly_date          
:- | -: | :-- | :---------------------
a  |  2 | 1.3 | 2001-01-01 00:00:00+00
a  |  1 | 2.2 | 2001-02-01 00:00:00+00
a  |  0 | 2.2 | 2001-03-01 00:00:00+00
a  |  5 | 6.2 | 2001-04-01 00:00:00+01
a  |  0 | 6.2 | 2001-05-01 00:00:00+01
a  |  0 | 6.2 | 2001-06-01 00:00:00+01
b  |  3 | 9   | 2001-03-01 00:00:00+00
b  |  0 | 9   | 2001-04-01 00:00:00+01
b  |  0 | 9   | 2001-05-01 00:00:00+01
b  |  0 | 9   | 2001-06-01 00:00:00+01

dbfiddle here

Related Solutions

Postgresql – How to simplify a nested SELECT with PostgreSQL arrays

IN queries with huge sets are notoriously slow. It's often faster to use a JOIN instead:

SELECT nodes
FROM   planet_osm_ways
JOIN   (
   SELECT ltrim(member, 'w')::bigint AS id
   FROM  (
      SELECT unnest(members) AS member
      FROM   planet_osm_rels
      WHERE  (tags_hstore @> '"type"=>"boundary", "admin_level"=>"2", ...')
      ) u
   WHERE member LIKE 'w%'
   ) x USING (id);

But that's not the most important problem here. I wonder why the GIN index planet_osm_rels_tags_hstore_idx is not being used. Are you selecting large enough parts of the table planet_osm_rels to justify a sequential scan?

Oh, and id is type bigint. So cast to bigint instead of int for less friction.

If you can extract "way IDs" and save them redundantly in a separate column way_ids bigint[] in your table, your query would become quite a bit simpler and faster, with one less subquery level:

SELECT nodes
FROM   planet_osm_ways
JOIN   (
   SELECT unnest(way_ids) AS id
   FROM   planet_osm_rels
   WHERE  (tags_hstore @> '"type"=>"boundary", "admin_level"=>"2", ...')
   ) u USING (id);

PostgreSQL – Avoiding SUM Over Same Values Multiple Times

You could use a subquery but you don't need to. Just don't sum the bonus and add it in the GROUP BY list.
Notice that you have to also add the student.id, even in your original query, in case you have 2 students with same name.
You probably also need coalesce() for students without any scores:

SELECT st.name, 
       coalesce(sum(sc.score1),0) + coalesce(sum(sc.score2),0) + st.bonus AS total
FROM student st
LEFT JOIN score sc ON sc.student_id = st.id
GROUP BY st.id, st.name, st.bonus ;

In newer versions of Postgres, you could use only the primary key of the student table in the group by:

SELECT st.name, 
       coalesce(sum(sc.score1),0) + coalesce(sum(sc.score2),0) + st.bonus AS total
FROM student st
LEFT JOIN score sc ON sc.student_id = st.id
GROUP BY st.id ;

If you want a subquery, this is one way:

SELECT st.name, 
       coalesce(sc.score, 0) + st.bonus AS total
FROM student st
LEFT JOIN 
    ( SELECT student_id, sum(score1) + sum(score2) AS score
      FROM score 
      GROUP BY student_id
    ) AS sc ON sc.student_id = st.id ;

Best Answer

Related Solutions

Postgresql – How to simplify a nested SELECT with PostgreSQL arrays

PostgreSQL – Avoiding SUM Over Same Values Multiple Times

Related Question