How could one write a Postgresql query that gives both daily and cumulative counts without gaps? E.g. if there is no data for a given day, it would show 0
as a daily count for that that day and the same cumulative sum as the previous day?
I think I'd need GENERATE_SERIES
, but I don't know how to do it. I'm also not entirely sure if order by day asc rows between unbounded preceding and current row
would work correctly always, but maybe not the biggest issue here.
I have tried to craft the query with PARTITION BY
and
Let's say I write a table and add data such as
create table test
(
id int4 NOT NULL GENERATED ALWAYS AS IDENTITY,
data int4,
created_at timestamptz NOT NULL DEFAULT now()
);
insert into test(data, created_at) values(1, '2021-04-01');
insert into test(data, created_at) values(2, '2021-04-01');
insert into test(data, created_at) values(3, '2021-04-02');
insert into test(data, created_at) values(4, '2021-04-03');
insert into test(data, created_at) values(5, '2021-04-05');
insert into test(data, created_at) values(6, '2021-04-07');
and then create a queries such as
SELECT
created_at as "Date",
count(1) as "Daily count"
FROM test
WHERE created_at >= '2021-04-01'
AND created_at <= '2021-04-30'
GROUP BY 1
giving
Date | Daily count |
---|---|
2021-04-01 00:00:00 | 2 |
2021-04-02 00:00:00 | 1 |
2021-04-03 00:00:00 | 1 |
2021-04-05 00:00:00 | 1 |
2021-04-07 00:00:00 | 1 |
with data as (
select
date_trunc('day', created_at) as day,
count(1)
from test
group by 1
)
select
day,
running_total(count) over (order by day asc rows between unbounded preceding and current row)
from data
day | running_total |
---|---|
2021-04-01 00:00:00 | 2 |
2021-04-02 00:00:00 | 3 |
2021-04-03 00:00:00 | 4 |
2021-04-05 00:00:00 | 5 |
2021-04-07 00:00:00 | 6 |
But as noted, how could these two be combined without gaps on daily values? Somehow it feels I get close but bump into some (syntax) problem. Maybe those two queries are the simplest and cleanest examples of what I'm thinking.
Best Answer
You write in your question:
Indeed you do - and you can as follows (all the code below is also given in the fiddle here):
Just to check:
Result:
Then, I use your data:
Populate (slightly modified from the question - so as not to have the id the same as the provided data (i.e. 1...6), - instead, I explicitly inserted hundreds - improves legibility):
And then run the following SQL:
Result:
A result which, I believe covers all your requested elements.
In this case, you don't have to worry about the FRAME clause (i.e. the
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) bit - FRAMEs are a kind of sub-partitioning method - but the long phrase above is actually the default - see here for a good introduction to window frames in PostgreSQL.You could also (depending on your requirements) use a CTE (Common Table Expression) for your query if you don't wish to create a permanent calendar table (see my answer here for an example of this). (+1 for having made me think!).