Redshift – Monthly Summary of Active Users Based on Start/End Date Roster

redshift

I have a table that dictates who was on a project, from a startdate to an enddate. I'd like some help writing a query that will return the number of "active" users at the end of every month, since the start of this year.

DROP TABLE "public"."roster";
CREATE TABLE "public"."roster" ("id" int,"user_id" int,"project_id" int,"start_date" datetime,"end_date" datetime,"closed_date" datetime, PRIMARY KEY ("id"));

INSERT INTO "public"."roster" ("id", "user_id", "project_id", "start_date", "end_date", "closed_date") VALUES
(1, 1, 1, '2019-05-27 00:00:00', '2021-01-15 00:00:00', NULL);

INSERT INTO "public"."roster" ("id", "user_id", "project_id", "start_date", "end_date", "closed_date") VALUES
(2, 2, 2, '2020-05-27 00:00:00', '2021-02-01 00:00:00', '2021-02-05 00:00:00');

INSERT INTO "public"."roster" ("id", "user_id", "project_id", "start_date", "end_date", "closed_date") VALUES
(3, 3, 3, '2020-05-27 00:00:00', '2024-02-01 00:00:00', '2021-02-05 00:00:00');

INSERT INTO "public"."roster" ("id", "user_id", "project_id", "start_date", "end_date", "closed_date") VALUES
(4, 4, 4, '2020-05-27 00:00:00', '2021-03-05 00:00:00', NULL);
id  user_id project_id  start_date            end_date              closed_date
1   1       1           2019-05-27 00:00:00   2021-01-15 00:00:00   NULL
2   2       2           2020-05-27 00:00:00   2021-02-01 00:00:00   2020-02-05 00:00:00
3   3       3           2020-05-27 00:00:00   2024-02-01 00:00:00   2020-02-05 00:00:00
4   4       4           2020-05-27 00:00:00   2021-03-05 00:00:00   NULL

And the result would be something that shows how many distinct users had an active project within each month (only since the start of this year).

So for the dataset above, we can see that all 4 projects were "active" in the month January 2021, because the end date is in the future.

2021-01-31 | 3 (4 projects were active in some way, during January)
2021-02-28 | 2 (3 projects were active in some way, during February)
2021-03-31 | 1 (1 project was active in some way, during March)

The last bit of complexity is that sometimes a project can be closed before the EndDate and I'd like to exclude any users who's endDate is in the future but the project has actually closed.

For example in the dataset above, the third project has an end date of Feb 2024, however the project closed in Feb 2021. So technically the person was active in 2021 Jan and Feb, but not March 2021 and beyond.

p.s Hoping to get an answer for Redshift

Best Answer

p.s Hoping to get an answer in Postgres (I'm using Redshift)

Solution for PostgreSQL:

WITH cte AS ( SELECT '2021-01-01'::DATE AS month_start, '2021-01-31'::DATE AS month_end UNION ALL
              SELECT '2021-02-01', '2021-02-28' UNION ALL
              SELECT '2021-03-01', '2021-03-31' )

SELECT cte.month_end, COUNT(*)
FROM roster
CROSS JOIN cte
WHERE start_date <= month_end 
  AND LEAST(end_date, closed_date) >= month_start
GROUP BY cte.month_end
ORDER BY cte.month_end;

or

WITH cte AS ( SELECT '2021-01-31'::DATE AS month_end UNION ALL
              SELECT '2021-02-28' UNION ALL
              SELECT '2021-03-31' )

SELECT cte.month_end, COUNT(*)
FROM roster
CROSS JOIN cte
WHERE start_date <= month_end 
  AND LEAST(end_date, closed_date) >= DATE_TRUNC('month', month_end)
GROUP BY cte.month_end
ORDER BY cte.month_end;

https://dbfiddle.uk/?rdbms=postgres_12&fiddle=710260e9fa44a89cc9c2d536739f7c92

Replace cte code with the code which you need (generate first and last days of months for the months in interest in the first variant or only months last days in the second variant). For example, use dates generator, start from YYYY-MM-01 of the next month for a month in interest, step as a month, and substract - a month for first day of month and a day for the last day.