Postgresql – Filling in Dates – Postgres/Redshift SQL

postgresql

I have a dates table with a "calendar_date" field and I have been trying to utilize it for when I need to fill in missing dates when an id does not have a record for a particular day AND have the daily count = 0. I cannot seem to find the proper solution for it.

Result example:

enter image description here

        SELECT 
        id,
        date_trunc('day',created_at) as activity_date,
        COALESCE(COUNT(DISTINCT user_id),0) AS daily_count
        FROM events e
        WHERE date(date_trunc('day',created_at)) <= CURRENT_DATE - INTERVAL '1 day' AND date(date_trunc('day',created_at)) >= CURRENT_DATE - INTERVAL '90 day'
        AND name = 'visited site'
        AND user_id != 0
        GROUP BY date_trunc('day',created_at), id

Any ideas/help would be appreciated!

Best Answer

In Postgres (which you explicitly tagged) you can do it like this. But I suspect you don't really use Postgres and the following is not supported by Redshift.

select d.id,
       d.activity_date,
       count(distinct e.user_id) as daily_count
from (
  select distinct e1.id, g.dt::date as activity_date
  from events e1
    cross join generate_series(current_date - 90, current_date - 1, interval '1 day') as g(dt)
  where e1.activity_date >= current_date - 90
    and e1.activity_date <= current_date - 1
) d
  left join events e 
    on e.activity_date = d.activity_date
   and e.id = d.id
   and e.user_id <> 0
   and e.name = 'visited site'
group by d.id, d.activity_date

The query first creates a list of all IDs and dates in the desired interval. This is necessary because you also want to group by the id column. Then a left join is used to bring in the events and do the aggregation.

The WHERE clause in the derived table (inner query) means that only IDs are used, that have at least one event in the desired range. If you need all IDs regardless if there was an event or not in that time frame, you either need to pull them from a different table, or remove the WHERE clause of the inner query.