Problem
I need to create a chart of User Retention over time, similar to this:
Ignoring the percenages for a minute, I have a query that shows unique users for a given "cohort" and then the number of users that come back. However, with the volume of data we've acquired over the last few weeks, the query no longer finishes.
Query
;WITH dates AS
(
-- Set up the date range
SELECT convert(date,GETDATE()) as dt, 1 as Id
UNION ALL
SELECT DATEADD(dd,-1,dt),dates.Id - 1
FROM dates
WHERE Id >= -84
)
, cohort as (
-- create the cohorts
SELECT dt AS StartDate,
convert(date,CASE WHEN DATEADD(DD, 6, dt) > convert(date,GETDATE()) THEN convert(date,GETDATE()) ELSE DATEADD(DD, 6, dt) END) as EndDate,
CONCAT(FORMAT(dt, 'MMM dd'), ' - ', FORMAT(CASE WHEN DATEADD(DD, 6, dt) > GETDATE() THEN GETDATE() ELSE DATEADD(DD, 6, dt) END, 'MMM dd')) as Cohort,
row_number() over (order by dt) as CohortNo
FROM dates A
WHERE DATEPART(dw,dt)=1
)
, cohortevent as (
-- The complete set of cohorts and their events
select c.*, e.*
from cohort c
left join Event e on e.eventtime between c.StartDate and C.EndDate
)
, Retained as(
-- Recursive CTE that works out how long each user has been retained
select c.StartDate,c.EndDate,c.CoHort,c.CohortNo,c.EventId,c.EventTime,c.Count,c.UserID, case when Userid is not null then 1 else 0 end as ret
from cohortevent c
union all
select c.StartDate,c.EndDate,c.CoHort,c.CohortNo,c.EventId,c.EventTime,c.Count,c.UserID, ret+1
from cohortevent c
join Retained on Retained.userid=c.userid and Retained.CohortNo=c.CohortNo-1 and Retained.eventid<c.eventid
)
, WeeksRetained as (
-- Get the highest number of weeks, which is the actual number per user (could probably be combined with previous CTE)
select StartDate, Enddate, Cohort, userID,
case when max(ret)=1 then '<1W' else '+'+convert(varchar,max(ret)-1)+'W' end as Weeks
from Retained
group by StartDate, Enddate, Cohort,userid
)
-- Finally pivot this by the number of weeks
select *
from
(
select StartDate, EndDate, Cohort, Weeks, count(distinct userID) as UserCount
from WeeksRetained
group by StartDate, EndDate, Cohort, Weeks
) src
pivot
(
sum(UserCount)
for Weeks in ([<1W], [+1W], [+2W], [+3W], [+4W], [+5W], [+6W], [+7W], [+8W], [+9W], [+10W], [+11W], [+12W])
) piv
OPTION (MAXRECURSION 0);
Environment
All of the tables are CTE's except for "Event" which has two main columns we care about, UserId and EventTime.
What I've tried
I've added indexes on both UserId and EventTime. I noticed the DTUs (this is an Azure SQL instance) were maxing out originally, but I've vertically scaled the database instance so the DB runs at 70% DTU usage and it's still not completing in 30+ minutes. There are currently only 40k rows in Event
.
Best Answer
Testing
Based on your estimated execution plan I tried to get some sample data and get an actual execution of your query. Remember that while I am trying to get closer to your issue, YMMV.
This is the actual plan that came out.
While not all operators & plan choices are the same, some parts look like they could match. One of these are the high amount of spools & filters with high row counts:
The querytimestats here show as
60
secondsThe performance can differ between separate executions and joining , filtering, ... on the execution of these date tables. It is much easier to calculate estimates on one small part than on everything at once.
Now, If I simply create a temporary table, and insert the results of the
cohortevent
cte into this temporary table to split up the work:&
The result is a query plan with less spools & better estimates due to not evaluating everything at once. You are kind of giving the optimizer a breather by splitting the query in two parts.
Actual Execution plan With the query time stats:
~ 6 - 7 seconds
Adding one more temp table on the
cohort
cte gives me the execution time I would desire.Actual execution plan With querytimestats=
&
&
~3 seconds
With the resulting query:
This will not be the best performing version of your query but this should resolve the issue with your spools going crazy. You should also investigate other workarounds like using a calendar table like @DanGuzman mentioned.