SQL Server – Count Items Not in Time Range

ctedaterecursivesql servert-sql

I'm attempting to count the number of items if they are not within 30 seconds of the first item in a "group". I'm having a hard time figuring this out.

So, I have this table:

WITH ADates AS (
    SELECT
          Id
        , SharedId
        , TheDateTime
    FROM (VALUES
        (CAST(1 AS int), CAST(1 AS int), CAST('2019-01-01 01:01:00.00' AS datetime2(7))),
        (2, 1, '2019-01-01 01:01:33.00'),
        (3, 1, '2019-01-01 01:02:00.00'),
        (4, 1, '2019-01-01 01:02:01.00'),
        (5, 1, '2019-01-01 01:02:04.00'),
        (6, 1, '2019-01-01 01:06:15.00'),
        (7, 2, '2019-01-01 01:06:00.00'),
        (8, 2, '2019-01-01 01:06:45.00'),
        (9, 1, '2019-01-01 01:02:31.00'),
        (10, 2, '2019-01-01 01:06:05.00'),
        (11, 2, '2019-01-01 01:06:46.00'),
    ) X (Id, SharedId, TheDateTime)
)

So, the expected result that I'm looking for is:

+==========+=======+
| SharedId | Count |
+==========+=======+
| 1        | 4     |
+----------+-------+
| 2        | 2     |
+----------+-------+

The numbers are determined by:

  1. Count since first in new group.
  2. Not within 30 seconds of previous group so it is a new group and count.
  3. Don't count since it is within 30 seconds of 2.
  4. Don't count since it is within 30 seconds of 2.
  5. Count since not within 30 seconds of previous group (Item 2).
  6. Count since not within 30 seconds of previous group (item 2).
  7. Count in new group for SharedId.
  8. Count since not within previous grouping.

I'm thinking I should be doing a Window Function for this. Just not sure how to have it rely on just the first of the group.

Best Answer

Problem Categorization

I was looking for a way to use an analytical function that keeps track of inline manipulations. A single run analytical function is only able to perform so much, but not to the extend to solve this problem. The problem with nesting analytical functions is that we loose information about our dynamic pattern.

To allow dynamic inline pattern matching, in Oracle you can use MATCH_RECOGNIZE. I had no clue how to do it in Sql Server though. Then I came across a similar problem, which got resolved using a recursive CTE.


Proposed Solution

Fiddle with solution

- SharedId    GroupStartDateTime
- 1           01/01/2019 01:01:00
- 1           01/01/2019 01:01:33
- 1           01/01/2019 01:02:04
- 1           01/01/2019 01:06:15
- 2           01/01/2019 01:06:00
- 2           01/01/2019 01:06:45 
6 rows

CteBase and CteRecursive are heavely inspired by Bogdan Sahlean's answer on this related question.

WITH CteBase
AS
(
        SELECT  v.SharedId,
                v.TheDateTime,
                ROW_NUMBER() OVER(PARTITION BY v.SharedId ORDER BY v.TheDateTime) 
                       AS RowNum
        FROM    ADates v
),  CteRecursive
AS
(
        SELECT  crt.SharedId,
                crt.TheDateTime,
                crt.TheDateTime AS GroupStartDateTime,
                crt.RowNum,
                1 AS SharedIdRowNum
        FROM    CteBase crt
        WHERE   crt.RowNum = 1
        UNION ALL
        SELECT  crt.SharedId,
                crt.TheDateTime,
                CASE 
                    WHEN DATEDIFF(SECOND, prv.GroupStartDateTime, crt.TheDateTime) <= 30 
                    THEN prv.GroupStartDateTime 
                    ELSE crt.TheDateTime 
                END,
                crt.RowNum,
                CASE 
                    WHEN DATEDIFF(SECOND, prv.GroupStartDateTime, crt.TheDateTime) <= 30 
                    THEN prv.SharedIdRowNum + 1
                    ELSE 1
                END             
        FROM    CteBase crt
        INNER JOIN CteRecursive prv ON crt.SharedId = prv.SharedId 
        AND     crt.RowNum = prv.RowNum + 1
)
SELECT SharedId, Count(*) as [COUNT] FROM (
        SELECT  r.SharedId,
                r.GroupStartDateTime
        FROM    CteRecursive r
        WHERE   r.SharedIdRowNum = 1
) X
GROUP BY SharedId;