MySQL – Select Consecutive Rows with Same Values

gaps-and-islandsMySQL

I have a table with the following data:

userID  tStamp                status
------  -------------------   ------
Jason   2017-10-18 03:20:00   idle  
Brown   2017-10-18 03:20:28   idle  
Brown   2017-10-18 03:25:28   idle  
Brown   2017-10-18 04:00:28   active    
Brown   2017-10-18 04:10:28   active    
Brown   2017-10-18 04:35:28   idle  
Brown   2017-10-18 04:45:28   idle  

I want to extract the consecutive rows with the same status. For example, I want to see the time spent by a user in the 'idle' status, followed by 'active' status, followed by 'idle' status, and so on.

How do I do this in a single SQL query?

My desired output is the following:

userID        staus          Duration_in_this_status (min)
------  -------------------   ------
Jason         idle             ---  
Brown         idle              5
Brown         active           10   
Brown         idle             10   

Best Answer

SELECT userID, status, TIMESTAMPDIFF(minute, MIN(tStamp), MAX(tStamp)) AS duration
FROM (
    SELECT
    t.*
    , @groupNumber := IF(@prev_userID != userID OR @prev_status != status, @groupNumber + 1, @groupNumber) AS gn
    , @prev_userID := userID
    , @prev_status := status
    FROM t
    , (SELECT @groupNumber := 0, @prev_userID := NULL, @prev_status := NULL) var_init_subquery
    ORDER BY userID, tStamp
) sq
GROUP BY gn, userID, status

Here's how it works. We define three variables. One holding a group number and two holding the values of the previous row values of status and userId. Note, that in a relational database there is no order unless you specify it. This is very important. In the select clause, we first check, if the variable values differ from the current row. If yes, we increment the group number, if not we leave it as it is. After that, we assign the values of the current row. So when the variables are evaluated when the next row is processed, they still hold the values of the previous row. So the order is important here, too. In the outer query we simply can group by this group number to get the min and max values of the timestamp.

  • read more about user-defined variables here