SQL Server – How to Sample Records by Time

sql serversql server 2014t-sql

So I've got a table in SQL Server 2014 that stores events. Events are generated no faster than once per 10 seconds, but can be much slower (minutes or hours between events). The time between events is not uniform.

I'd like to sample the records by time.
For instance, I'd like to ask for a 10 minute sampling – so it would grab record 1, and the next record should be one that is at least 10 minutes later, and then the next is at least 10 minutes later than row 2, and so on.

I can achieve this with an iterative approach, looping over each record, but I'd like to let the database do the work. The problem is, I don't know how to do this using T-SQL set/window statements.

Can anyone help me out?

Example data:

EventID | RecordTime
--------+--------------------
   1    | 2017-04-01 12:00:00
   2    | 2017-04-01 12:00:10
   3    | 2017-04-01 12:00:20
   4    | 2017-04-01 12:00:32
   5    | 2017-04-01 12:05:42
   6    | 2017-04-01 12:09:00
   7    | 2017-04-01 12:24:12
   8    | 2017-04-01 12:36:46
   9    | 2017-04-01 12:36:57
  10    | 2017-04-01 15:00:00

Desired result:

EventID | RecordTime
--------+--------------------
   1    | 2017-04-01 12:00:00
   7    | 2017-04-01 12:24:12
   8    | 2017-04-01 12:36:46
  10    | 2017-04-01 15:00:00
  • EventIDs 2-6 are all within 10 minutes of record 1, so I don't want them.
  • EventID 7 is 24 minutes 12 seconds later than the last record in my result set (EventID 1), so I do want it.
  • EventID 8 is 12 minutes 34 seconds later than the last record in my result set (EventID 7), so I do want it.
  • EventID 9 is 11 seconds later than the last record in my result set (EventID 8), so I do not want it.
  • EventID 10 is apx. 2.5 hours later than the last record in my result set (EventID 9), so I do want it.

Best Answer

If you don't want event at 12:10:01, then answer is very simple - use LEAD or LAG to compare timestamps of the two consecutive rows. Just be aware that if you have a long sequence of events in which each pair of events is less than 10 minutes apart, then the query will return only the first event of this sequence.

In other words, there has to be gap of more than 10 minutes between two events to add a row to the result set.

Sample data

DECLARE @T TABLE(EventID int, RecordTime datetime2(0));
INSERT INTO @T (EventID, RecordTime) VALUES
( 1, '2017-04-01 12:00:00'),
( 2, '2017-04-01 12:00:10'),
( 3, '2017-04-01 12:00:20'),
( 4, '2017-04-01 12:00:32'),
( 5, '2017-04-01 12:05:42'),
( 6, '2017-04-01 12:09:00'),
( 7, '2017-04-01 12:24:12'),
( 8, '2017-04-01 12:36:46'),
( 9, '2017-04-01 12:36:57'),
(10, '2017-04-01 15:00:00');

Query

WITH
CTE
AS
(
    SELECT
        EventID
        ,RecordTime
        ,LAG(RecordTime) OVER (ORDER BY EventID) AS PrevRecordTime
    FROM @T
)
SELECT
    EventID
    ,RecordTime
FROM
    CTE
    CROSS APPLY
    (
        SELECT DATEDIFF(second, PrevRecordTime, RecordTime) AS RecordDiffSeconds
    ) AS CA
WHERE
    RecordDiffSeconds IS NULL
    OR RecordDiffSeconds > 600
ORDER BY EventID;

Result

+---------+---------------------+
| EventID |     RecordTime      |
+---------+---------------------+
|       1 | 2017-04-01 12:00:00 |
|       7 | 2017-04-01 12:24:12 |
|       8 | 2017-04-01 12:36:46 |
|      10 | 2017-04-01 15:00:00 |
+---------+---------------------+