Sql-server – Window Function to get pairs of records and compute the time difference

etloraclepostgresqlsql serverwindow functions

I'm working with an event dataset as the one in the example below:

Event_Type  Event_Timestamp         Is_Active
A           2010-10-01 00:00:00     1
B           2010-10-01 00:00:01     1
A           2010-10-01 00:00:02     0
D           2010-10-01 00:00:03     1
B           2010-10-01 00:00:04     0
C           2010-10-01 00:00:05     1
A           2010-10-01 00:00:06     1
A           2010-10-01 00:00:07     1
A           2010-10-01 00:00:08     0

The dataset is ordered by the timestamp of the events and the data grows as events happens in real time. More event types could be included in the dataset at any time (not limited to only A, B, C, D as in the example) and the same event type could appear in the table multiple times. The Is_Active boolean field serves as a way to indicate whether the event is still active (1) or not (0).

In this sense, I tried to do some transformations on this data by using SQL Window Functions. I'm not necessarily restricted to a specific product or technology, so feel free to tell how you would address the problem below.

What I want to do is dynamically pair each event of the same type when they have opposite Is_Active values and then get how long that event was active. In other words, given an event X, I would need to get the Event_Timestamp for it when Is_Active had 1 for the first time (Begin_Timestamp), and then ignore the rest of the rows for this event X with Is_Active 1 until I get Is_Active 0, so I could take the Event_Timestamp again (End_Timestamp). Then, I would continue applying this logic when I find event X having 1 in the Is_Active column again.

An example of the resulting table would be:

Event_Type  Begin_Timestamp         End_Timestamp           Duration
A           2010-10-01 00:00:00     2010-10-01 00:00:02     2 seconds
B           2010-10-01 00:00:01     2010-10-01 00:00:04     3 seconds
D           2010-10-01 00:00:03     null                    null
C           2010-10-01 00:00:05     null                    null
A           2010-10-01 00:00:06     2010-10-01 00:00:08     2 seconds

Is there any window function that would help me get these pairs of events so I could compute the duration of each event?

Best Answer

You need something like

-- get previous status, use -1 value if no previous record (because NULL value needs more complex condition in cte2)
WITH cte1 AS (SELECT Event_Type, 
                     Event_Timestamp, 
                     Is_Active, 
                     COALESCE(LAG(Is_Active) OVER (PARTITION BY Event_Type 
                                                   ORDER BY Event_Timestamp ASC), -1) prev_event
              FROM datatable),
     -- remove records where status not altered
     cte2 AS (SELECT Event_Type, 
                     Event_Timestamp, 
                     Is_Active
              FROM cte1
              WHERE Is_Active != prev_event)
-- get data we need
SELECT Event_Type, 
       Event_Timestamp Begin_Timestamp, 
       LEAD(Event_Timestamp) OVER (PARTITION BY Event_Type 
                                   ORDER BY Event_Timestamp ASC) End_Timestamp, 
       LEAD(Event_Timestamp) OVER (PARTITION BY Event_Type 
                                   ORDER BY Event_Timestamp ASC) - Event_Timestamp Duration
FROM cte2
WHERE Is_Active = 1

Of course you can combine it into the query using one CTE which calculates all window values. The above long way is used for to explain the algorythm.