MySQL – Select Consecutive Rows with Same Values

gaps-and-islandsMySQL

I have a table with the following data:

userID  tStamp                status
------  -------------------   ------
Jason   2017-10-18 03:20:00   idle  
Brown   2017-10-18 03:20:28   idle  
Brown   2017-10-18 03:25:28   idle  
Brown   2017-10-18 04:00:28   active    
Brown   2017-10-18 04:10:28   active    
Brown   2017-10-18 04:35:28   idle  
Brown   2017-10-18 04:45:28   idle

I want to extract the consecutive rows with the same status. For example, I want to see the time spent by a user in the 'idle' status, followed by 'active' status, followed by 'idle' status, and so on.

How do I do this in a single SQL query?

My desired output is the following:

userID        staus          Duration_in_this_status (min)
------  -------------------   ------
Jason         idle             ---  
Brown         idle              5
Brown         active           10   
Brown         idle             10

Best Answer

SELECT userID, status, TIMESTAMPDIFF(minute, MIN(tStamp), MAX(tStamp)) AS duration
FROM (
    SELECT
    t.*
    , @groupNumber := IF(@prev_userID != userID OR @prev_status != status, @groupNumber + 1, @groupNumber) AS gn
    , @prev_userID := userID
    , @prev_status := status
    FROM t
    , (SELECT @groupNumber := 0, @prev_userID := NULL, @prev_status := NULL) var_init_subquery
    ORDER BY userID, tStamp
) sq
GROUP BY gn, userID, status

see it working live in an sqlfiddle

Here's how it works. We define three variables. One holding a group number and two holding the values of the previous row values of status and userId. Note, that in a relational database there is no order unless you specify it. This is very important. In the select clause, we first check, if the variable values differ from the current row. If yes, we increment the group number, if not we leave it as it is. After that, we assign the values of the current row. So when the variables are evaluated when the next row is processed, they still hold the values of the previous row. So the order is important here, too. In the outer query we simply can group by this group number to get the min and max values of the timestamp.

read more about user-defined variables here

Related Solutions

Mysql – Count consecutive rows in thesql

Since gadget_id is a vehicle, you need to monitor two things as you look at each row

when a gadget_id switches location
when a gadget_id switches to another gadget_id

The solution lies in organizing a set of user variables to monitor that change. Please forgive you are about to see:

First, let's load your data in the test database in a table called gadget_location:

mysql> use test
Database changed
mysql> drop table gadget_location;
Query OK, 0 rows affected (0.07 sec)

mysql> create table gadget_location
    -> (
    ->     id int not null auto_increment,
    ->     gadget_id int,
    ->     location  varchar(30),
    ->     submitted_date date,
    ->     primary key (id)
    -> );
Query OK, 0 rows affected (0.13 sec)

mysql> insert into gadget_location (gadget_id,location,submitted_date) values
    -> (1,'calicut' , '2012-07-15'), (1,'calicut' , '2012-07-14'),
    -> (1,'calicut' , '2012-07-13'), (2,'thrissur', '2012-07-12'),
    -> (1,'calicut' , '2012-07-11'), (1,'kannur'  , '2012-07-10'),
    -> (2,'thrissur', '2012-07-09'), (1,'calicut' , '2012-07-08');
Query OK, 8 rows affected (0.05 sec)
Records: 8  Duplicates: 0  Warnings: 0

mysql> select * from gadget_location;
+----+-----------+----------+----------------+
| id | gadget_id | location | submitted_date |
+----+-----------+----------+----------------+
|  1 |         1 | calicut  | 2012-07-15     |
|  2 |         1 | calicut  | 2012-07-14     |
|  3 |         1 | calicut  | 2012-07-13     |
|  4 |         2 | thrissur | 2012-07-12     |
|  5 |         1 | calicut  | 2012-07-11     |
|  6 |         1 | kannur   | 2012-07-10     |
|  7 |         2 | thrissur | 2012-07-09     |
|  8 |         1 | calicut  | 2012-07-08     |
+----+-----------+----------+----------------+
8 rows in set (0.00 sec)

mysql>

OK, here comes the mess:

SET @dupcount = 0;
SET @group_number = 0;
SET @cur_gadget_id = 0;
SET @cur_location = MD5("1");
SET @cur_gadget_location = MD5("1");
SELECT gadget_id,location,GroupNumber,COUNT(1) DaysStopped FROM
(
SELECT
    *,
    @dupcount     := IF(@cur_gadget_location=gadget_location,@dupcount+1,1) Dup_Count,
    @group_number := IF(@cur_gadget_location=gadget_location,@group_number,@group_number+1) GroupNumber,
    @cur_gadget_location := gadget_location
FROM
(
    SELECT *,CONCAT(gadget_id,'-',cur_loc) gadget_location FROM
    (
        SELECT *,
        @cur_location  := IF(MD5(location)=@cur_location,@cur_location,MD5(location)) cur_loc,
        @cur_gadget_id := IF(gadget_id=@cur_gadget_id,@cur_gadget_id,gadget_id) cur_gadget
        FROM gadget_location
    ) AAA
) AA ) A GROUP BY gadget_id,location,GroupNumber;

Want to see it work ??? Here it goes:

mysql> SET @dupcount = 0;
Query OK, 0 rows affected (0.00 sec)

mysql> SET @group_number = 0;
Query OK, 0 rows affected (0.00 sec)

mysql> SET @cur_gadget_id = 0;
Query OK, 0 rows affected (0.00 sec)

mysql> SET @cur_location = MD5("1");
Query OK, 0 rows affected (0.00 sec)

mysql> SET @cur_gadget_location = MD5("1");
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT gadget_id,location,GroupNumber,COUNT(1) DaysStopped FROM
    -> (
    -> SELECT
    ->     *,
    ->     @dupcount     := IF(@cur_gadget_location=gadget_location,@dupcount+1,1) Dup_Count,
    ->     @group_number := IF(@cur_gadget_location=gadget_location,@group_number,@group_number+1) GroupNumber,
    ->     @cur_gadget_location := gadget_location
    -> FROM
    -> (
    ->     SELECT *,CONCAT(gadget_id,'-',cur_loc) gadget_location FROM
    ->     (
    ->         SELECT *,
    ->         @cur_location  := IF(MD5(location)=@cur_location,@cur_location,MD5(location)) cur_loc,
    ->         @cur_gadget_id := IF(gadget_id=@cur_gadget_id,@cur_gadget_id,gadget_id) cur_gadget
    ->         FROM gadget_location
    ->     ) AAA
    -> ) AA ) A GROUP BY gadget_id,location,GroupNumber;
+-----------+----------+-------------+-------------+
| gadget_id | location | GroupNumber | DaysStopped |
+-----------+----------+-------------+-------------+
|         1 | calicut  |           1 |           3 |
|         1 | calicut  |           3 |           1 |
|         1 | calicut  |           6 |           1 |
|         1 | kannur   |           4 |           1 |
|         2 | thrissur |           2 |           1 |
|         2 | thrissur |           5 |           1 |
+-----------+----------+-------------+-------------+
6 rows in set (0.02 sec)

mysql>

According to this output, here is what you have:

Gadget 1

Group 1 : stopped at calicut for 3 days
Group 3 : left and came back for 1 day
Group 4 : left calicut and went to kannur for 1 day
Group 6 : left kannur and went to calicut for 1 day

Gadget 2

Group 2 : stopped for thrissur for 1 day
Group 5 : left and came back for 1 day

For some reason, the GroupNumbers came out different in SQLFiddle`. Notwithstanding, the rest of the output is the same.

I hope this is right ...

MySQL 5.5 – Find Total Duration of Each Consecutive Series of Rows

Query 1, tested in SQLFiddle-1

SET @ward_id_to_check = 1 ;

SELECT
    st.patient_id,
    st.bed_id AS starting_bed_id,          -- the first bed a patient uses
                                           -- can be omitted
    st.admitted,
    MIN(en.discharged) AS discharged
FROM
  ( SELECT patient_id, bed_id, admitted, discharged
    FROM t 
    WHERE t.ward_id = @ward_id_to_check
      AND NOT EXISTS
          ( SELECT * 
            FROM t AS prev 
            WHERE prev.ward_id = @ward_id_to_check
              AND prev.patient_id = t.patient_id
              AND prev.discharged = t.admitted
          )
  ) AS st
JOIN
  ( SELECT patient_id, admitted, discharged
    FROM t 
    WHERE t.ward_id = @ward_id_to_check
      AND NOT EXISTS
          ( SELECT * 
            FROM t AS next 
            WHERE next.ward_id = @ward_id_to_check
              AND next.patient_id = t.patient_id
              AND next.admitted = t.discharged
          )
  ) AS en
    ON  st.patient_id = en.patient_id
    AND st.admitted <= en.admitted
GROUP BY
    st.patient_id,
    st.admitted ;

Query 2, which is the same as 1 but without the derived tables. This will probably have better execution plan, with proper indexes. Test in SQLFiddle-2:

SET @ward_id_to_check = 1 ;

SELECT
    st.patient_id,
    st.bed_id AS starting_bed_id,
    st.admitted,
    MIN(en.discharged) AS discharged
FROM
    t AS st    -- starting period
  JOIN
    t AS en    -- ending period
      ON  en.ward_id = @ward_id_to_check
      AND st.patient_id = en.patient_id
      AND NOT EXISTS
          ( SELECT * 
            FROM t AS next 
            WHERE next.ward_id = @ward_id_to_check
              AND next.patient_id = en.patient_id
              AND next.admitted = en.discharged
          )
      AND st.admitted <= en.admitted
WHERE 
      st.ward_id = @ward_id_to_check
  AND NOT EXISTS
      ( SELECT * 
        FROM t AS prev 
        WHERE prev.ward_id = @ward_id_to_check
          AND prev.patient_id = st.patient_id
          AND prev.discharged = st.admitted
      )
GROUP BY
    st.patient_id,
    st.admitted ;

Both queries assume that there is a unique constraint on (patient_id, admitted). If the server runs with strict ANSI settings, the bed_id should be added in the GROUP BY list.