MySQL – How to Sort Records Before Inter-Row Calculations

MySQLorder-bysubqueryview

I have a time sequence data set to analyze but the difficulty is that

  1. the data set is very big.
  2. the time sequence events are committed by different objects with affiliation.
  3. its in mysql, there is no id like index column to join

e.g.

----------------
dt     obj        comp
----------------
t1,    object_a,   component_1
t2,    object_b,   component_1
t3,    object_b,   component_2
t4,    object_a,   component_1
t5,    object_b,   component_1
t6,    object_b,   component_2
----------------

I am trying to know the delay between each commit
from each component of each object.
i.e.
obj_a.comp_1.delay1 = obj_a.compoent_1.t4 – obj_a.component_1.t1

etc.

I tried to join

select timediff( t1.dt, t.dt ) as delay
from  table as t 
join table as t1
on t1.comp = t.comp
and t1.obj = t.obj
and t1.dt = (
    select min(t2.dt)
    from table as t2
    where t2.obj = t.obj 
    and t2.comp = t.comp 
    and t2.dt > t.dt
)

which taking forever, so I am thinking if there is any way to sort on the results and do inter-row calculation with variables, it would be much faster.
but it failed every way with sub-query and views where the sort is always done after calculation. From what I read on order by, it seems impossible. So what options do I have to to achieve this reasonably fast?

To be more specific:
If I do the following:

SELECT  
    obj
    , comp
    , dt - @prev
    , @prev := dt
FROM 
   table
ORDER BY obj, comp, dt ASC

the result is

obja, comp1, t1-null
obja, comp1, t4-t3      # should be t4-t1
objb, comp1, t2-t1      
objb, comp1, t5-t4      # should be t5-t2
objb, comp2, t3-t2
objb, comp2, t6-t5      # should be t6-t3

Since the calculation happens on unsorted data, the results do not make sense. The table in my case is actually a view, the component is calculated field. It doesn't have an index. The join-subquery way could takes half an hour.
But if I save the sorted results into a table and calculate with variable, it only takes minutes.
The data set is very big, I only tested with a small subset.
I am not sure the method of saving to extra table is a scalable solution.
So I am looking for advise on alternatives.

Best Answer

I have managed to successfully use variables and sorting to solve your problem. This is my test set-up:

CREATE TABLE atable
    (`dt` datetime, `obj` varchar(8), `comp` varchar(11))
;

INSERT INTO atable
    (`dt`, `obj`, `comp`)
VALUES
    ('2016-09-13 06:00:01', 'object_a', 'component_1'),
    ('2016-09-13 06:00:02', 'object_b', 'component_1'),
    ('2016-09-13 06:00:04', 'object_b', 'component_2'),
    ('2016-09-13 06:00:08', 'object_a', 'component_1'),
    ('2016-09-13 06:00:16', 'object_b', 'component_1'),
    ('2016-09-13 06:00:32', 'object_b', 'component_2')
;

And this is the query:

SELECT
  dt,
  obj,
  comp,
  diff
FROM
  (SELECT @obj := '', @comp := '', @dt := CAST('1970-01-01' AS datetime)) AS var,
  (
    SELECT
      TIMESTAMPDIFF(SECOND, IF(obj = @obj AND comp = @comp, @dt, NULL), dt) AS diff,
      @dt := dt AS dt,
      @obj := obj AS obj,
      @comp := comp AS comp
    FROM
      atable
    ORDER BY
      obj,
      comp,
      dt
  ) AS sub
;

As this SQL Fiddle demo will show, the query returns the following output:

dt                   obj       comp         diff
-------------------  --------  -----------  ----
2016-09-13 06:00:01  object_a  component_1  NULL
2016-09-13 06:00:08  object_a  component_1  7
2016-09-13 06:00:02  object_b  component_1  NULL
2016-09-13 06:00:16  object_b  component_1  14
2016-09-13 06:00:04  object_b  component_2  NULL
2016-09-13 06:00:32  object_b  component_2  28