Mysql – identify gaps and islands

gaps-and-islandsMySQL

I have a database that has the following structure

+---------------------+------------+---------------+
| timestamp           | EngOilP_sd | CompOilLVL_sd |
+---------------------+------------+---------------+
| 2015-06-24 20:28:07 |          0 |             0 |
| 2015-06-24 20:30:20 |          1 |             0 |
| 2015-06-24 20:36:47 |          1 |             0 |
| 2015-06-24 20:41:11 |          1 |             0 |
| 2015-06-24 20:43:29 |          1 |             0 |
| 2015-06-24 20:45:42 |          0 |             0 |
| 2015-06-24 20:47:51 |          0 |             0 |
| 2015-06-24 20:49:59 |          0 |             1 |
| 2015-06-24 20:52:01 |          0 |             1 |
| 2015-06-24 20:54:17 |          0 |             0 |
+---------------------+------------+---------------+

I would like to generate a report like the following:

sd             duration      start                    stop
EngOilP_sd     15min 22s     2015-06-24 20:30:20      2015-06-24 20:45:42
CompOilLVL_sd  4min 18s      2015-06-24 20:49:59      2015-06-24 20:54:17

I should add that it is not critical that the stop record is the first one where the value = 0, it could be the last record where the value = 1

How might I easily create a well performing query that would give me the information I need? I am using Python for the app, so I would not need the SQL to output the report in this exact form, just get me close with the guts of the information.

Best Answer

There's probably something more elegant than this, but since you are doing a kind of pivot (transforming from columns to rows, i.e. metadata to data) it is never going to be pretty. Assuming a table like:

create table t
( ts timestamp not null
, EngOilP_sd smallint not null
, CompOilLVL_sd smallint not null
) engine = innodb;

insert into t (ts, EngOilP_sd, CompOilLVL_sd)
values 
 ('2015-06-24 20:28:07',0 ,0 )
,('2015-06-24 20:30:20',1 ,0 )
,('2015-06-24 20:36:47',1 ,0 )
,('2015-06-24 20:41:11',1 ,0 )
,('2015-06-24 20:43:29',1 ,0 )
,('2015-06-24 20:45:42',0 ,0 )
,('2015-06-24 20:47:51',0 ,0 )
,('2015-06-24 20:49:59',0 ,1 )
,('2015-06-24 20:52:01',0 ,1 )
,('2015-06-24 20:54:17',0 ,0 );

Since we are going to reference the "pivot" several times we create a view for it:

create view t_pivot as 
    select 'EngOilP_sd' as sd, ts, EngOilP_sd as val from t 
    union all 
    select 'CompOilLVL_sd', ts, CompOilLVL_sd from t;

The start time (using @Pauls comment as the definition) is the earliest time where each attribute is 1:

select sd, min(case val when 1 then ts end) as start
from t_pivot     
group by sd

The stop time is the earliest time after that where each attribute is 0:

select x.sd, x.start, min(case y.val when 0 then y.ts end) as stop
from (
    select sd, min(case val when 1 then ts end) as start
    from t_pivot     
    group by sd 
) as x 
join t_pivot as y
    on x.sd = y.sd
   and y.ts > x.start 
group by x.sd, x.start 

Adding another level of nesting makes it convenient to determine the duration:

select sd, timediff(stop, start) as duration, start, stop 
from ( 
    select x.sd, x.start, min(case y.val when 0 then y.ts end) as stop
    from (
        select sd, min(case val when 1 then ts end) as start
        from t_pivot     
        group by sd 
    ) as x 
    join t_pivot as y
        on x.sd = y.sd
       and y.ts > x.start 
    group by x.sd, x.start 
) as z 
order by sd desc;

+---------------+----------+---------------------+---------------------+
| sd            | duration | start               | stop                |
+---------------+----------+---------------------+---------------------+
| EngOilP_sd    | 00:15:22 | 2015-06-24 20:30:20 | 2015-06-24 20:45:42 |
| CompOilLVL_sd | 00:04:18 | 2015-06-24 20:49:59 | 2015-06-24 20:54:17 |
+---------------+----------+---------------------+---------------------+
2 rows in set (0.00 sec)