I have a time sequence data set to analyze but the difficulty is that
- the data set is very big.
- the time sequence events are committed by different objects with affiliation.
- its in mysql, there is no id like index column to join
e.g.
----------------
dt obj comp
----------------
t1, object_a, component_1
t2, object_b, component_1
t3, object_b, component_2
t4, object_a, component_1
t5, object_b, component_1
t6, object_b, component_2
----------------
I am trying to know the delay between each commit
from each component of each object.
i.e.
obj_a.comp_1.delay1 = obj_a.compoent_1.t4 – obj_a.component_1.t1
etc.
I tried to join
select timediff( t1.dt, t.dt ) as delay
from table as t
join table as t1
on t1.comp = t.comp
and t1.obj = t.obj
and t1.dt = (
select min(t2.dt)
from table as t2
where t2.obj = t.obj
and t2.comp = t.comp
and t2.dt > t.dt
)
which taking forever, so I am thinking if there is any way to sort on the results and do inter-row calculation with variables, it would be much faster.
but it failed every way with sub-query and views where the sort is always done after calculation. From what I read on order by
, it seems impossible. So what options do I have to to achieve this reasonably fast?
To be more specific:
If I do the following:
SELECT
obj
, comp
, dt - @prev
, @prev := dt
FROM
table
ORDER BY obj, comp, dt ASC
the result is
obja, comp1, t1-null
obja, comp1, t4-t3 # should be t4-t1
objb, comp1, t2-t1
objb, comp1, t5-t4 # should be t5-t2
objb, comp2, t3-t2
objb, comp2, t6-t5 # should be t6-t3
Since the calculation happens on unsorted data, the results do not make sense. The table in my case is actually a view, the component is calculated field. It doesn't have an index. The join-subquery way could takes half an hour.
But if I save the sorted results into a table and calculate with variable, it only takes minutes.
The data set is very big, I only tested with a small subset.
I am not sure the method of saving to extra table is a scalable solution.
So I am looking for advise on alternatives.
Best Answer
I have managed to successfully use variables and sorting to solve your problem. This is my test set-up:
And this is the query:
As this SQL Fiddle demo will show, the query returns the following output: