Postgresql – How to do a cumulative count for certain values

aggregatepostgresqlpostgresql-9.6

I have a scenario to do cumulative counts in certain way. So I decided to create an experiment for that before I run the query in the actual environment.

I have created a table as:

create table ttable (id numeric(10,0), txt text, dtd timestamp, cnt numeric)

And inserted the values as:

insert into ttable values (1,'Pen','2019-01-01',10),
  (2,'Pencil','2019-01-01',10),
  (3,'Pen','2019-01-02',20),
  (4,'Eraser','2019-01-02',1),
  (5,'Eraser','2019-01-03',1),
  (6,'Other1','2019-01-03',10),
  (7,'Other1','2019-01-04',10);

Now, I do have data as below:

 id  txt      dtd                   cnt
 --  ------   -------------------   ---
 1   Pen      2019-01-01 00:00:00   10
 2   Pencil   2019-01-01 00:00:00   10
 3   Pen      2019-01-02 00:00:00   20
 4   Eraser   2019-01-02 00:00:00   1
 5   Eraser   2019-01-03 00:00:00   1
 6   Other1   2019-01-03 00:00:00   10
 7   Other1   2019-01-04 00:00:00   10

I want the output as below:

txt      dtd                    cnt
------   -------------------    ---
Eraser   2019-01-01 00:00:00    0
Pen      2019-01-01 00:00:00    10
Pencil   2019-01-01 00:00:00    10

Eraser   2019-01-02 00:00:00    1
Pen      2019-01-02 00:00:00    30
Pencil   2019-01-02 00:00:00    10

Eraser   2019-01-03 00:00:00    2
Pen      2019-01-03 00:00:00    30
Pencil   2019-01-03 00:00:00    10
Other1   2019-01-03 00:00:00    10

Eraser   2019-01-03 00:00:00    2
Pen      2019-01-03 00:00:00    20
Pencil   2019-01-03 00:00:00    10
Other1   2019-01-04 00:00:00    20

(I have added empty lines in above output for clear presentation).

I want to consider only first three (Eraser, Pen, Pencil) should be repeated for each date, rest can be added if there is any entry for the date.

I have found a query from Stackoverflow for date range generation and modified based on my needs:

select dt::date from generate_series('2019-01-01', '2019-01-05', '1 day'::interval) dt

I could use above query to join with my main table data. However, I couldn't achieve what I want:

with b1 as (
select id, txt, dtd, sum(cnt) over (partition by txt order by dtd) cnt from ttable
group by 1,2,3, cnt order by 3
),
b2 as 
(select dt::date from generate_series('2019-01-01', '2019-01-05', '1 day'::interval) dt ),
b3 as 
(select 'Pen' obj union select 'Pencil' obj union select 'Eraser' obj)
select b2.dt, b1.*, b3.* from b2 left join b1 on b1.dtd = b2.dt left join b3 on b1.txt = b3.obj

What am I missing here?

Best Answer

I think that you want something like this (see fiddle here):

WITH item AS
(
  SELECT txt FROM ttable WHERE txt IN ('Eraser', 'Pen', 'Pencil')

  -- You have specified these three items as somehow "special", so that's where 
  -- this bit comes from.

),
item_date AS
(
  SELECT DISTINCT i.txt, t.dtd FROM item i, ttable t  -- CROSS JOIN special items
  ORDER BY dtd, txt                                   -- and dates
)
SELECT id.dtd, id.txt, COALESCE(t.cnt, 0) AS icnt -- COALESCE to get 0 for NULLs
FROM item_date id
LEFT JOIN ttable t
  ON id.txt = t.txt
  AND id.dtd = t.dtd
UNION                             -- the UNION here is to reunite non-special items
SELECT t.dtd, t.txt, t.cnt        -- i.e. Other1 with the special ones.
FROM ttable t
WHERE t.txt NOT IN (SELECT txt FROM item)
ORDER BY dtd, txt

Result (dates separated for clarity as in your question):

                dtd txt   icnt
-----------------------------
2019-01-01 00:00:00 Eraser   0
2019-01-01 00:00:00 Pen     10
2019-01-01 00:00:00 Pencil  10

2019-01-02 00:00:00 Eraser   1
2019-01-02 00:00:00 Pen     20
2019-01-02 00:00:00 Pencil   0  -- << your INSERT has no Pencils for 2019-01-02

2019-01-03 00:00:00 Eraser   1
2019-01-03 00:00:00 Other1  10
2019-01-03 00:00:00 Pen      0
2019-01-03 00:00:00 Pencil   0

2019-01-04 00:00:00 Eraser   0
2019-01-04 00:00:00 Other1  10
2019-01-04 00:00:00 Pen      0
2019-01-04 00:00:00 Pencil   0

Related Solutions

MySQL – Rolling Count of Total Transactions Over Time

What you want is called the cumulative sum, you can do something like:

create table transactions (transactionid int, d date);
insert into transactions (transactionid, d) 
    values (1, '2014-08-04'),(2,'2014-08-05'), (3, '2014-08-18')
         , (4, '2014-08-18'), (5,'2014-08-20');

select x.y, x.w,  count(1) 
from ( 
   select distinct year(d) as y, week(d) as w 
   from transactions
) as x 
join transactions y 
    on year(y.d) < x.y
    or ( year(y.d) = x.y
     and week(y.d) <= x.w ) 
group by x.y, x.w;  

+------+------+----------+
| y    | w    | count(1) |
+------+------+----------+
| 2014 |   31 |        2 |
| 2014 |   33 |        5 |
+------+------+----------+

I did not see your additional request for 2 2 for 2014. You can do that by replacing:

select distinct year(d) as y, week(d) as w 
from transactions

...with an expression that creates the whole domain for weeks. It is often a good idea to create a calendar table that you can use to join against to get reports for missing values etc.

Postgresql – Cumulative data and dates – the joins aren’t working

As inspired by a_horse_with_no_name, I moved the right part of the join to a subquery so the where clause doesn't interfere with the join condition

This works as intended:

WITH dates  as (SELECT min(date) as start_date,
                       max(date) as end_date
                from training_training
                where athlete_id = 1)

SELECT  distinct(d.date),
   sum(distance) OVER (ORDER BY d.date)
FROM
  (
    SELECT generate_series(start_date, end_date, interval '1 day') as date
    FROM dates
  ) d
LEFT JOIN
  (
    SELECT
      date,
      distance
    FROM training_training
    WHERE athlete_id = 1
          AND kind IN ('t', 'd', 'i', 'w')
  ) t
ON d.date = t.date

GROUP BY d.date, distance
ORDER BY d.date

Best Answer

Related Solutions

MySQL – Rolling Count of Total Transactions Over Time

Postgresql – Cumulative data and dates – the joins aren’t working

Related Question