Mysql – Count consecutive rows in thesql

MySQLselect

I would like to get the count of consecutive rows in MySQL. It is a large database I try to avoiding the joins. Here gadget_id means vehicle and every 20/30 second the vehicle sends the current location. I want to know how long time a vehicle halts in certain location.

Will explain by below table, it is the latest data

id gadget_id  location  submitted_date  
   ---------  --------  ----------  
1  1          calicut   2012-07-15  
2  1          calicut   2012-07-14  
3  1          calicut   2012-07-13  
4  2          thrissur  2012-07-12  
5  1          calicut   2012-07-11
6  1          kannur    2012-07-10
7  2          thrissur  2012-07-09
8  1          calicut   2012-07-08    
…          …         …

*I want to know how long time gadget_id 1 halts in position calicut.*

Here we can show that the gadget_id = 1 and position=calicut last 4 data's comes from same position calicut. The next data of the gadget_id=1 is from kannur, so we avoid the data from this id. How to get the count 4 when we give the input gadget_id = 1 and position=calicut

Anybody give the suitable query, expect a single query without joins.

Best Answer

Since gadget_id is a vehicle, you need to monitor two things as you look at each row

when a gadget_id switches location
when a gadget_id switches to another gadget_id

The solution lies in organizing a set of user variables to monitor that change. Please forgive you are about to see:

First, let's load your data in the test database in a table called gadget_location:

mysql> use test
Database changed
mysql> drop table gadget_location;
Query OK, 0 rows affected (0.07 sec)

mysql> create table gadget_location
    -> (
    ->     id int not null auto_increment,
    ->     gadget_id int,
    ->     location  varchar(30),
    ->     submitted_date date,
    ->     primary key (id)
    -> );
Query OK, 0 rows affected (0.13 sec)

mysql> insert into gadget_location (gadget_id,location,submitted_date) values
    -> (1,'calicut' , '2012-07-15'), (1,'calicut' , '2012-07-14'),
    -> (1,'calicut' , '2012-07-13'), (2,'thrissur', '2012-07-12'),
    -> (1,'calicut' , '2012-07-11'), (1,'kannur'  , '2012-07-10'),
    -> (2,'thrissur', '2012-07-09'), (1,'calicut' , '2012-07-08');
Query OK, 8 rows affected (0.05 sec)
Records: 8  Duplicates: 0  Warnings: 0

mysql> select * from gadget_location;
+----+-----------+----------+----------------+
| id | gadget_id | location | submitted_date |
+----+-----------+----------+----------------+
|  1 |         1 | calicut  | 2012-07-15     |
|  2 |         1 | calicut  | 2012-07-14     |
|  3 |         1 | calicut  | 2012-07-13     |
|  4 |         2 | thrissur | 2012-07-12     |
|  5 |         1 | calicut  | 2012-07-11     |
|  6 |         1 | kannur   | 2012-07-10     |
|  7 |         2 | thrissur | 2012-07-09     |
|  8 |         1 | calicut  | 2012-07-08     |
+----+-----------+----------+----------------+
8 rows in set (0.00 sec)

mysql>

OK, here comes the mess:

SET @dupcount = 0;
SET @group_number = 0;
SET @cur_gadget_id = 0;
SET @cur_location = MD5("1");
SET @cur_gadget_location = MD5("1");
SELECT gadget_id,location,GroupNumber,COUNT(1) DaysStopped FROM
(
SELECT
    *,
    @dupcount     := IF(@cur_gadget_location=gadget_location,@dupcount+1,1) Dup_Count,
    @group_number := IF(@cur_gadget_location=gadget_location,@group_number,@group_number+1) GroupNumber,
    @cur_gadget_location := gadget_location
FROM
(
    SELECT *,CONCAT(gadget_id,'-',cur_loc) gadget_location FROM
    (
        SELECT *,
        @cur_location  := IF(MD5(location)=@cur_location,@cur_location,MD5(location)) cur_loc,
        @cur_gadget_id := IF(gadget_id=@cur_gadget_id,@cur_gadget_id,gadget_id) cur_gadget
        FROM gadget_location
    ) AAA
) AA ) A GROUP BY gadget_id,location,GroupNumber;

Want to see it work ??? Here it goes:

mysql> SET @dupcount = 0;
Query OK, 0 rows affected (0.00 sec)

mysql> SET @group_number = 0;
Query OK, 0 rows affected (0.00 sec)

mysql> SET @cur_gadget_id = 0;
Query OK, 0 rows affected (0.00 sec)

mysql> SET @cur_location = MD5("1");
Query OK, 0 rows affected (0.00 sec)

mysql> SET @cur_gadget_location = MD5("1");
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT gadget_id,location,GroupNumber,COUNT(1) DaysStopped FROM
    -> (
    -> SELECT
    ->     *,
    ->     @dupcount     := IF(@cur_gadget_location=gadget_location,@dupcount+1,1) Dup_Count,
    ->     @group_number := IF(@cur_gadget_location=gadget_location,@group_number,@group_number+1) GroupNumber,
    ->     @cur_gadget_location := gadget_location
    -> FROM
    -> (
    ->     SELECT *,CONCAT(gadget_id,'-',cur_loc) gadget_location FROM
    ->     (
    ->         SELECT *,
    ->         @cur_location  := IF(MD5(location)=@cur_location,@cur_location,MD5(location)) cur_loc,
    ->         @cur_gadget_id := IF(gadget_id=@cur_gadget_id,@cur_gadget_id,gadget_id) cur_gadget
    ->         FROM gadget_location
    ->     ) AAA
    -> ) AA ) A GROUP BY gadget_id,location,GroupNumber;
+-----------+----------+-------------+-------------+
| gadget_id | location | GroupNumber | DaysStopped |
+-----------+----------+-------------+-------------+
|         1 | calicut  |           1 |           3 |
|         1 | calicut  |           3 |           1 |
|         1 | calicut  |           6 |           1 |
|         1 | kannur   |           4 |           1 |
|         2 | thrissur |           2 |           1 |
|         2 | thrissur |           5 |           1 |
+-----------+----------+-------------+-------------+
6 rows in set (0.02 sec)

mysql>

According to this output, here is what you have:

Gadget 1

Group 1 : stopped at calicut for 3 days
Group 3 : left and came back for 1 day
Group 4 : left calicut and went to kannur for 1 day
Group 6 : left kannur and went to calicut for 1 day

Gadget 2

Group 2 : stopped for thrissur for 1 day
Group 5 : left and came back for 1 day

For some reason, the GroupNumbers came out different in SQLFiddle`. Notwithstanding, the rest of the output is the same.

I hope this is right ...

Related Solutions

Mysql – Count consecutive null rows from a joined table

This is essentially a gaps-and-islands problem. And when I have my SQL Server hat on, I often solve this kind of problem with two ROW_NUMBER() calls. Sadly, MySQL, unlike many other major SQL products, does not support ROW_NUMBER(), nor any other ranking function. To make up for that, however, you can use variable assignment in SELECTs, which MySQL does support (unlike many other major SQL products).

Below is a solution followed by an explanation:

SELECT
  member_id,
  member_name,
  event_id,
  COUNT(*) AS consecutive_times_missed,
  MIN(event_date) AS first_date_missed,
  MAX(event_date) AS last_date_missed
FROM (
  SELECT
    member_id,
    member_name,
    event_id,
    event_date,
    is_missed,
    @occ_ranking := (event_id = @last_event) * (member_id = @last_member) * @occ_ranking + 1,
    @att_ranking := (event_id = @last_event) * (member_id = @last_member)
                                             * (is_missed = @last_missed) * @att_ranking + 1,
    @occ_ranking - @att_ranking AS grp,
    @last_member := member_id,
    @last_event  := event_id,
    @last_missed := is_missed
  FROM (
    SELECT
      m.member_id,
      m.member_name,
      e.event_id,
      e.event_date,
      (a.attendance_date IS NULL) AS is_missed
    FROM       members     m
    INNER JOIN event_dates e ON m.member_join_date <= e.event_date
    LEFT  JOIN attendance  a ON m.member_id = a.member_id
                            AND e.event_id = a.event_id
                            AND e.event_date = a.attendance_date,
    (
      SELECT
        @occ_ranking := 0,
        @att_ranking := 0,
        @last_member := 0,
        @last_event  := 0,
        @last_missed := 0
    ) v
    ORDER BY
      m.member_id,
      e.event_date
  ) s
) s
WHERE
  is_missed = 1
GROUP BY
  member_id,
  member_name,
  event_id,
  grp
HAVING
  COUNT(*) >= 3
;

Basically, you start with joining members and event_dates to get all the event occurrences the members could have attended based on their membership dates. Then you throw in the attendance table (via a left join) to flag the missed occurrences. Here's an example of what you get by this time:

member_id  event_id  event_date  is_missed
---------  --------  ----------  ---------
1          1         2012-07-10  0
2          1         2012-07-10  1
1          1         2012-07-14  0
2          1         2012-07-14  1
…          …         …           …

At this point, the resulting set needs to be sorted by member_id, event_id, because that is crucial to the subsequent rankings calculation.

Two different rankings are produced for every row. One is a ranking within the row's partition of (member_id, event_id) (it is reset as soon as a new event or a new member is encountered). The other is a ranking within the specific group of consecutive event occurrences, either attended or missed, that the row belongs to (this ranking, in addition to being reset upon coming across a new member or event, also gets reset whenever the other group is encountered). And so you get something like this:

member_id  event_id  event_date  is_missed  ranking1  ranking2
---------  --------  ----------  ---------  --------  --------
…          …         …           …          …         …
1          1         2012-07-27  0          4         4
1          1         2012-07-28  0          5         5
1          1         2012-07-29  1          6         1
2          1         2012-07-10  1          1         1
2          1         2012-07-14  1          2         2
2          1         2012-07-25  1          3         3
2          1         2012-07-27  0          4         1
…          …         …           …          …         …

As you may have noticed, the difference between the two ranking numbers is constant throughout the particular group of consecutive event occurrences of the same kind ("attended" or "missed") and is also unique for that group within its partition. Therefore, every such group can be identified by member_id, event_id and the just mentioned difference.

And it now remains simply to filter the events leaving just the missed ones, group the rows and get the necessary aggregated data, like the number of rows and, possibly, as in the query above, the dates of the first and the last event occurrence in the group. The number of rows is also used in an additional filter condition to omit groups with fewer rows than required.

Thesql select count on same table slow subquery

What is the datatype of clldate and calldate? If they are both DATE, get rid of the function calls (DATE()) around them; it prevents use of indexes.

If wither or both is DATETIME (or TIMESTAMP), then we need to rephrase the query to make it more efficient.

Add compound index:
INDEX(caller_id, calldate)

If calldate is DATE, then also add
INDEX(calldate, caller_id)

How big is cdr? How big will it eventually become?

Does the performance improve much is you get rid of SQL_CALC_FOUND_ROWS? Consider whether you really need it.

Will you be purging 'old' data?

After you have answered my questions, I will consider whether PARTITIONing is worth doing.