Postgresql – How to do a cumulative count for certain values

aggregatepostgresqlpostgresql-9.6

I have a scenario to do cumulative counts in certain way. So I decided to create an experiment for that before I run the query in the actual environment.

I have created a table as:

create table ttable (id numeric(10,0), txt text, dtd timestamp, cnt numeric)

And inserted the values as:

insert into ttable values (1,'Pen','2019-01-01',10),
  (2,'Pencil','2019-01-01',10),
  (3,'Pen','2019-01-02',20),
  (4,'Eraser','2019-01-02',1),
  (5,'Eraser','2019-01-03',1),
  (6,'Other1','2019-01-03',10),
  (7,'Other1','2019-01-04',10);

Now, I do have data as below:

 id  txt      dtd                   cnt
 --  ------   -------------------   ---
 1   Pen      2019-01-01 00:00:00   10
 2   Pencil   2019-01-01 00:00:00   10
 3   Pen      2019-01-02 00:00:00   20
 4   Eraser   2019-01-02 00:00:00   1
 5   Eraser   2019-01-03 00:00:00   1
 6   Other1   2019-01-03 00:00:00   10
 7   Other1   2019-01-04 00:00:00   10

I want the output as below:

txt      dtd                    cnt
------   -------------------    ---
Eraser   2019-01-01 00:00:00    0
Pen      2019-01-01 00:00:00    10
Pencil   2019-01-01 00:00:00    10

Eraser   2019-01-02 00:00:00    1
Pen      2019-01-02 00:00:00    30
Pencil   2019-01-02 00:00:00    10

Eraser   2019-01-03 00:00:00    2
Pen      2019-01-03 00:00:00    30
Pencil   2019-01-03 00:00:00    10
Other1   2019-01-03 00:00:00    10

Eraser   2019-01-03 00:00:00    2
Pen      2019-01-03 00:00:00    20
Pencil   2019-01-03 00:00:00    10
Other1   2019-01-04 00:00:00    20

(I have added empty lines in above output for clear presentation).

I want to consider only first three (Eraser, Pen, Pencil) should be repeated for each date, rest can be added if there is any entry for the date.

I have found a query from Stackoverflow for date range generation and modified based on my needs:

select dt::date from generate_series('2019-01-01', '2019-01-05', '1 day'::interval) dt

I could use above query to join with my main table data. However, I couldn't achieve what I want:

with b1 as (
select id, txt, dtd, sum(cnt) over (partition by txt order by dtd) cnt from ttable
group by 1,2,3, cnt order by 3
),
b2 as 
(select dt::date from generate_series('2019-01-01', '2019-01-05', '1 day'::interval) dt ),
b3 as 
(select 'Pen' obj union select 'Pencil' obj union select 'Eraser' obj)
select b2.dt, b1.*, b3.* from b2 left join b1 on b1.dtd = b2.dt left join b3 on b1.txt = b3.obj

What am I missing here?

Best Answer

I think that you want something like this (see fiddle here):

WITH item AS
(
  SELECT txt FROM ttable WHERE txt IN ('Eraser', 'Pen', 'Pencil')

  -- You have specified these three items as somehow "special", so that's where 
  -- this bit comes from.

),
item_date AS
(
  SELECT DISTINCT i.txt, t.dtd FROM item i, ttable t  -- CROSS JOIN special items
  ORDER BY dtd, txt                                   -- and dates
)
SELECT id.dtd, id.txt, COALESCE(t.cnt, 0) AS icnt -- COALESCE to get 0 for NULLs
FROM item_date id
LEFT JOIN ttable t
  ON id.txt = t.txt
  AND id.dtd = t.dtd
UNION                             -- the UNION here is to reunite non-special items
SELECT t.dtd, t.txt, t.cnt        -- i.e. Other1 with the special ones.
FROM ttable t
WHERE t.txt NOT IN (SELECT txt FROM item)
ORDER BY dtd, txt

Result (dates separated for clarity as in your question):

                dtd txt   icnt
-----------------------------
2019-01-01 00:00:00 Eraser   0
2019-01-01 00:00:00 Pen     10
2019-01-01 00:00:00 Pencil  10

2019-01-02 00:00:00 Eraser   1
2019-01-02 00:00:00 Pen     20
2019-01-02 00:00:00 Pencil   0  -- << your INSERT has no Pencils for 2019-01-02

2019-01-03 00:00:00 Eraser   1
2019-01-03 00:00:00 Other1  10
2019-01-03 00:00:00 Pen      0
2019-01-03 00:00:00 Pencil   0

2019-01-04 00:00:00 Eraser   0
2019-01-04 00:00:00 Other1  10
2019-01-04 00:00:00 Pen      0
2019-01-04 00:00:00 Pencil   0