Convert Log table days in a state

optimizationoracle

This is simplified but represents the problem I am trying to solve. We have a table containing 5-10 million rows in a format similar to the following…

Input Table
Date    Item   Moved to Box
Oct-1   1      BoxA 
Oct-6   1      BoxB
Oct-8   1      BoxC
Oct-9   1      BoxB
Oct-16  1      BoxC
Oct-17  1      BoxD

And I am trying to convert it into this

Expected Output
Item    Box    Duration
1       BoxA   5
1       BoxB   9
1       BoxC   2
1       BoxD   *unimportant

*It doesn't matter what the query returns for BoxD (the only box move in without a box move out) as it is discarded.

For the example hopefully you can see that the input table represents a log of when an item moves from box to box and the expected output is cumulitively how long each item spends in each box.

My first thought was to do a table join with itself and do some date min/maxing for each record to try to find the exit date from a box, then sum the results but it seems pretty process intensive.

How would someone approach this in an efficient way?

Best Answer

select item, 
       sum(duration),
       moved_to
from (
  select item, 
         lead(move_date)  over (partition by item order by move_date) - move_date as duration,
         moved_to
  from movement         
) t
group by item, moved_to
order by item, moved_to;

SQLFiddle example: http://www.sqlfiddle.com/#!4/dff2c/1