In MySQL 5.7, I have a table user
that contains the columns registered
and lastLogin
. My goal is to group the users by calendar months of registration and then select for each group how many of these users were still active after 1 month, after 2 months and so on.
20,000 users have registered in June 2015, from these 10,000 have been active in July 2015 or later, 5,000 have been active in August 2015 or later, etc. Continue this output for all users registered in the past months until today.
I understand how I can select any of the activity months by hand, for example select all users that were active one calendar month after their registration:
SELECT year(registered), month(registered), count(id)
FROM user
WHERE lastLogin >= DATE_ADD(DATE_SUB(date(registered), INTERVAL DAYOFMONTH(registered) - 1 DAY), INTERVAL 1 MONTH)
GROUP BY year(registered), month(registered)
This would give me the number of users who were active one month after their registration or later.
But in addition I would also like to select the number of users who were active 2 months after their registration, 3 months after their registration and so on up to the current month.
The results would be something like this (not necessarily look like this, but it should get the idea across):
Columns: Cohort | total | 1 month | 2months | 3months | ... | month n
07/2015 | 20,000 | 10,100 | 5,500 | 2,600 | ... | 150
08/2015 | 21,000 | 9,800 | 4,300 | 1,300 | ... | 180
09/2015 | 19,700 | 11,400 | 6,200 | 3,500 | ... | 200
...
08/2017 | 25,300 | 13,000
09/2017 | 12,500
How do I change the query to select all months of activity instead of a specific month?
Best Answer
With the help of Rick to look for pivot table I was able to create the final query as described in this post: https://stackoverflow.com/questions/7674786/mysql-pivot-table
Query for a 6 months analysis:
The part
DATE_SUB(DATE(registered), INTERVAL DAYOFMONTH(registered) - 1 DAY)
will return the first of a month to make sure the analysis is based on calendar months. I am not sure if it could be simplified. I was looking atSTR_TO_DATE(LEFT(registered, 7), "%y-%m")
but I was reading working on Strings is slower and should be avoided. If someone knows a better / cleaner way of doing that, please comment and I will include it in the answer.Here is some exemplary output of the query: