Mysql – identify gaps and islands

gaps-and-islandsMySQL

I have a database that has the following structure

+---------------------+------------+---------------+
| timestamp           | EngOilP_sd | CompOilLVL_sd |
+---------------------+------------+---------------+
| 2015-06-24 20:28:07 |          0 |             0 |
| 2015-06-24 20:30:20 |          1 |             0 |
| 2015-06-24 20:36:47 |          1 |             0 |
| 2015-06-24 20:41:11 |          1 |             0 |
| 2015-06-24 20:43:29 |          1 |             0 |
| 2015-06-24 20:45:42 |          0 |             0 |
| 2015-06-24 20:47:51 |          0 |             0 |
| 2015-06-24 20:49:59 |          0 |             1 |
| 2015-06-24 20:52:01 |          0 |             1 |
| 2015-06-24 20:54:17 |          0 |             0 |
+---------------------+------------+---------------+

I would like to generate a report like the following:

sd             duration      start                    stop
EngOilP_sd     15min 22s     2015-06-24 20:30:20      2015-06-24 20:45:42
CompOilLVL_sd  4min 18s      2015-06-24 20:49:59      2015-06-24 20:54:17

I should add that it is not critical that the stop record is the first one where the value = 0, it could be the last record where the value = 1

How might I easily create a well performing query that would give me the information I need? I am using Python for the app, so I would not need the SQL to output the report in this exact form, just get me close with the guts of the information.

Best Answer

There's probably something more elegant than this, but since you are doing a kind of pivot (transforming from columns to rows, i.e. metadata to data) it is never going to be pretty. Assuming a table like:

create table t
( ts timestamp not null
, EngOilP_sd smallint not null
, CompOilLVL_sd smallint not null
) engine = innodb;

insert into t (ts, EngOilP_sd, CompOilLVL_sd)
values 
 ('2015-06-24 20:28:07',0 ,0 )
,('2015-06-24 20:30:20',1 ,0 )
,('2015-06-24 20:36:47',1 ,0 )
,('2015-06-24 20:41:11',1 ,0 )
,('2015-06-24 20:43:29',1 ,0 )
,('2015-06-24 20:45:42',0 ,0 )
,('2015-06-24 20:47:51',0 ,0 )
,('2015-06-24 20:49:59',0 ,1 )
,('2015-06-24 20:52:01',0 ,1 )
,('2015-06-24 20:54:17',0 ,0 );

Since we are going to reference the "pivot" several times we create a view for it:

create view t_pivot as 
    select 'EngOilP_sd' as sd, ts, EngOilP_sd as val from t 
    union all 
    select 'CompOilLVL_sd', ts, CompOilLVL_sd from t;

The start time (using @Pauls comment as the definition) is the earliest time where each attribute is 1:

select sd, min(case val when 1 then ts end) as start
from t_pivot     
group by sd

The stop time is the earliest time after that where each attribute is 0:

select x.sd, x.start, min(case y.val when 0 then y.ts end) as stop
from (
    select sd, min(case val when 1 then ts end) as start
    from t_pivot     
    group by sd 
) as x 
join t_pivot as y
    on x.sd = y.sd
   and y.ts > x.start 
group by x.sd, x.start

Adding another level of nesting makes it convenient to determine the duration:

select sd, timediff(stop, start) as duration, start, stop 
from ( 
    select x.sd, x.start, min(case y.val when 0 then y.ts end) as stop
    from (
        select sd, min(case val when 1 then ts end) as start
        from t_pivot     
        group by sd 
    ) as x 
    join t_pivot as y
        on x.sd = y.sd
       and y.ts > x.start 
    group by x.sd, x.start 
) as z 
order by sd desc;

+---------------+----------+---------------------+---------------------+
| sd            | duration | start               | stop                |
+---------------+----------+---------------------+---------------------+
| EngOilP_sd    | 00:15:22 | 2015-06-24 20:30:20 | 2015-06-24 20:45:42 |
| CompOilLVL_sd | 00:04:18 | 2015-06-24 20:49:59 | 2015-06-24 20:54:17 |
+---------------+----------+---------------------+---------------------+
2 rows in set (0.00 sec)

Related Solutions

Sql-server – Gaps and islands: client solution vs T-SQL query

And a 1 second solution...

;WITH cteSource(StartedAt, FinishedAt)
AS (
    SELECT      s.StartedAt,
            e.FinishedAt
    FROM        (
                SELECT  StartedAt,
                    ROW_NUMBER() OVER (ORDER BY StartedAt) AS rn
                FROM    dbo.Tasks
            ) AS s
    INNER JOIN  (
                SELECT  FinishedAt,
                    ROW_NUMBER() OVER (ORDER BY FinishedAt) + 1 AS rn
                FROM    dbo.Tasks
            ) AS e ON e.rn = s.rn
    WHERE       s.StartedAt > e.FinishedAt

    UNION ALL

    SELECT  MIN(StartedAt),
        MAX(FinishedAt)
    FROM    dbo.Tasks
), cteGrouped(theTime, grp)
AS (
    SELECT  u.theTime,
        (ROW_NUMBER() OVER (ORDER BY u.theTime) - 1) / 2
    FROM    cteSource AS s
    UNPIVOT (
            theTime
            FOR theColumn IN (s.StartedAt, s.FinishedAt)
        ) AS u
)
SELECT      MIN(theTime),
        MAX(theTime)
FROM        cteGrouped
GROUP BY    grp
ORDER BY    grp

Mysql – identify multiple gaps and islands for n columns

Sample data:

CREATE TABLE t
    (`ts` varchar(19), `EngOilP_sd` int, `CompOilLVL_sd` int)
;

INSERT INTO t
    (`ts`, `EngOilP_sd`, `CompOilLVL_sd`)
VALUES
    ('2015-06-24 20:28:07', 0, 0),
    ('2015-06-24 20:30:20', 1, 0),
    ('2015-06-24 20:36:47', 1, 1),
    ('2015-06-24 20:41:11', 1, 1),
    ('2015-06-24 20:43:29', 1, 1),
    ('2015-06-24 20:45:42', 1, 0),
    ('2015-06-24 20:47:51', 0, 0),
    ('2015-06-24 20:49:59', 0, 1),
    ('2015-06-24 20:52:01', 1, 1),
    ('2015-06-24 20:53:01', 1, 1),
    ('2015-06-24 20:54:01', 1, 1),
    ('2015-06-24 20:56:01', 1, 1),
    ('2015-06-24 20:57:01', 1, 1),
    ('2015-06-24 20:58:01', 1, 1),
    ('2015-06-24 21:02:01', 1, 1),
    ('2015-06-24 21:32:01', 0, 1),
    ('2015-06-24 21:52:01', 0, 0),
    ('2015-06-24 22:12:01', 0, 0),
    ('2015-06-24 22:20:01', 0, 0),
    ('2015-06-24 22:49:01', 0, 1),
    ('2015-06-24 23:52:01', 0, 1),
    ('2015-06-24 23:54:17', 0, 0)
;

create view t_pivot as
    select 'EngOilP_sd' as sd, ts, EngOilP_sd as val from t
    union all
    select 'CompOilLVL_sd', ts, CompOilLVL_sd from t;

Query:

SELECT sd, TIMEDIFF(MAX(ts), MIN(ts)) AS duration, MIN(ts) AS startime, MAX(ts) AS endtime
FROM (
    SELECT 
    tp.*
    , @val_change := IF(@prev_val != tp.val, @val_change + 1, @val_change) AS vc
    , @prev_val := tp.val
    FROM
    t_pivot tp
    , (SELECT @prev_val := NULL, @val_change := 0) var_init_subquery
    ORDER BY sd, ts
) sq
WHERE val = 1
GROUP BY sd, vc;

Result:

|            sd |                  duration |            startime |             endtime |
|---------------|---------------------------|---------------------|---------------------|
| CompOilLVL_sd | January, 01 1970 00:06:42 | 2015-06-24 20:36:47 | 2015-06-24 20:43:29 |
| CompOilLVL_sd | January, 01 1970 00:42:02 | 2015-06-24 20:49:59 | 2015-06-24 21:32:01 |
| CompOilLVL_sd | January, 01 1970 01:03:00 | 2015-06-24 22:49:01 | 2015-06-24 23:52:01 |
|    EngOilP_sd | January, 01 1970 00:15:22 | 2015-06-24 20:30:20 | 2015-06-24 20:45:42 |
|    EngOilP_sd | January, 01 1970 00:10:00 | 2015-06-24 20:52:01 | 2015-06-24 21:02:01 |

See it working live in an sqlfiddle

Explanation:

With the subquery in the FROM clause we are initializing our variables. The ORDER BY is very important! Here's why: In the SELECT clause each line is processed. The order in the SELECT clause is also important. Here

, @val_change := IF(@prev_val != tp.val, @val_change + 1, @val_change) AS vc

we increment our variable @val_change if the value in the column val differs from the "previous" row (that's why ORDER BY is so important). The "previous row" is in the variable @prev_val. The value of the current row is assigned to it in this line

, @prev_val := tp.val

Since MySQL processes each line in the SELECT clause in the order the SELECT clause is written, the @prev_val variable holds the value of the previous row (or NULL when no row has been processed yet) in the previous line of the SELECT clause.

Now all we have to do is GROUP BY sd and this @val_change variable in the outer query.

Read more about how to use user-defined variables in the manual here.

CAVEAT:

This will result in a full table scan.

sorry for bad english, hope it's understandable

UPDATED QUESTION:

The query still is the same. You just have to adjust the view t_pivot

create view t_pivot as
    select 'EngOilP_sd' as sd, ts, EngOilP_sd as val from t
    union all
    select 'CompOilLVL_sd', ts, CompOilLVL_sd from t
    union all
    select 'Brake_sd', ts, Brake_sd from t;

to include Brake_sd.

See it working live in an sqlfiddle

Best Answer

Related Solutions

Sql-server – Gaps and islands: client solution vs T-SQL query

Mysql – identify multiple gaps and islands for n columns

Related Question