Sql-server – How to Calculate Month over Month User Retention based on already active users (not based on user signup date)

MySQLoraclepostgresqlsql server

I have a Table which tracks User activity (i-e user started a session at what time ?). This table contains the data from Dec 2018 Till now. I need to calculate monthly retention (not based on signup date) based on user Activity( i-e In December 2018, 500 users were active. Then, how many of them were active in Jan,Feb,Mar….Till Now?Same activity should be performed for Jan 2019,Feb 2019… till now users).

I have tried the hard coded way which is to get the users of Dec 2018 in on Table and then Get the Jan 2019 users in other Table and join both tables based on the user_ids, but for that purpose i have to write a lot of joins. Need a dynamic way to check month over month user retention and for all months after Dec 2018(because the data is available from this month onward).

Postrgesql Code #

select A.year_month_id,count(distinct A.user_id) as November_Users,count(distinct B.user_id) as December_Retained_Users 
FROM (
        select date_trunc('month', ua.created_at) as monthly,
        ua.user AS user_id
        FROM     user_activity ua
        WHERE    ua.event_type='StartSession'
        and     cast(ua.created_at as date) between cast('20181201' as date) and cast('20181231' as date)
        GROUP BY 1,2
    ) AS A
left Join 
    (
        select date_trunc('month', ua.created_at) as monthly,
        ua.user AS user_id
        FROM     user_activity ua
        WHERE    ua.event_type='StartSession'
        and     cast(ua.created_at as date) between cast('20190101' as date) and cast('20190131' as date)
        GROUP BY 1,2
    ) AS B 
on A.user_id=B.user_id
group by 1

user_activity Table #

id | user | event_type   | created_at
1  | A1   | StartSession | April 29, 2019, 3:59 AM
2  | A2   | StartSession | December 29, 2018, 1:07 AM
3  | A3   | StartSession | December 9, 2018, 4:59 PM
49 | A31  | StartSession | May 25, 2019, 11:59 AM
100| A46  | StartSession | April 29, 2019, 3:56 AM

Expected Output #

Month |Monthly_Active_Users| Jan_Retained|Feb_Retained|Mar_Retained|.......
Dec   | 500                |  300        |  200       | 330
Jan   | 700                |  N/A        |  450       | 410
Feb   | 1000               |  N/A        |  N/A       | 820
Mar   | 920                |  N/A        |  N/A       | N/A
.
.
.
.
Aug   | 100                | N/A         |    N/A     | N/A

Best Answer

I am not sure I understood your requires completely, but anyway.

In Postgres I would probably do something like this:

with per_month as (
  select to_char(a1.created_at, 'yyyy-mm') as month,
         array_agg(distinct a1.user_id) as active_users
  from user_activity a1
  where a1.created_at >= timestamp '2018-12-01'
  group by month
), m2 as (
  select month, 
         active_users,
         lead(active_users) over (order by month) as january_users,
         lead(active_users,2) over (order by month) as february_users,
         lead(active_users,3) over (order by month) as march_users,
         lead(active_users,4) over (order by month) as april_users
  from per_month 
)
select month,
       cardinality(active_users) as num_users,
       (select count(*) from (select unnest(active_users) intersect select unnest(january_users)) t)  as january_retained,
       (select count(*) from (select unnest(active_users) intersect select unnest(february_users)) t)  as febrary_retained,
       (select count(*) from (select unnest(active_users) intersect select unnest(march_users)) t)  as march_retained
from m2;

The first CTE simply aggregates all users per month into an array. The second CTE then simply creates on column per month with the user of that month using a window function.

And the final query then calculates the intersection between the users of the starting month with each of the following months by unnesting the arrays and counting the size of the intersection.

Online example: https://rextester.com/VYNSFX1788 (note that it generates random data, so each time you run it you get a different result)