I have a table that dictates who was on a project, from a startdate to an enddate. I'd like some help writing a query that will return the number of "active" users at the end of every month, since the start of this year.
DROP TABLE "public"."roster";
CREATE TABLE "public"."roster" ("id" int,"user_id" int,"project_id" int,"start_date" datetime,"end_date" datetime,"closed_date" datetime, PRIMARY KEY ("id"));
INSERT INTO "public"."roster" ("id", "user_id", "project_id", "start_date", "end_date", "closed_date") VALUES
(1, 1, 1, '2019-05-27 00:00:00', '2021-01-15 00:00:00', NULL);
INSERT INTO "public"."roster" ("id", "user_id", "project_id", "start_date", "end_date", "closed_date") VALUES
(2, 2, 2, '2020-05-27 00:00:00', '2021-02-01 00:00:00', '2021-02-05 00:00:00');
INSERT INTO "public"."roster" ("id", "user_id", "project_id", "start_date", "end_date", "closed_date") VALUES
(3, 3, 3, '2020-05-27 00:00:00', '2024-02-01 00:00:00', '2021-02-05 00:00:00');
INSERT INTO "public"."roster" ("id", "user_id", "project_id", "start_date", "end_date", "closed_date") VALUES
(4, 4, 4, '2020-05-27 00:00:00', '2021-03-05 00:00:00', NULL);
id user_id project_id start_date end_date closed_date
1 1 1 2019-05-27 00:00:00 2021-01-15 00:00:00 NULL
2 2 2 2020-05-27 00:00:00 2021-02-01 00:00:00 2020-02-05 00:00:00
3 3 3 2020-05-27 00:00:00 2024-02-01 00:00:00 2020-02-05 00:00:00
4 4 4 2020-05-27 00:00:00 2021-03-05 00:00:00 NULL
And the result would be something that shows how many distinct users had an active project within each month (only since the start of this year).
So for the dataset above, we can see that all 4 projects were "active" in the month January 2021, because the end date is in the future.
2021-01-31 | 3 (4 projects were active in some way, during January)
2021-02-28 | 2 (3 projects were active in some way, during February)
2021-03-31 | 1 (1 project was active in some way, during March)
The last bit of complexity is that sometimes a project can be closed before the EndDate and I'd like to exclude any users who's endDate is in the future but the project has actually closed.
For example in the dataset above, the third project has an end date of Feb 2024, however the project closed in Feb 2021. So technically the person was active in 2021 Jan and Feb, but not March 2021 and beyond.
p.s Hoping to get an answer for Redshift
Best Answer
Solution for PostgreSQL:
or
https://dbfiddle.uk/?rdbms=postgres_12&fiddle=710260e9fa44a89cc9c2d536739f7c92
Replace
cte
code with the code which you need (generate first and last days of months for the months in interest in the first variant or only months last days in the second variant). For example, use dates generator, start fromYYYY-MM-01
of the next month for a month in interest, step as a month, and substract - a month for first day of month and a day for the last day.