Sql-server – Aggregates per half hour intervals within a time span

sql serversql-server-2012t-sql

Consider the following table:

CREATE TABLE dbo.Events (
    [ClientID] int,
    [EventID] int,
    [EventName] varchar(50),
    [StartDate] datetime,
    [Duration] bigint
)

INSERT INTO dbo.Events(ClientID, EventID, EventName, StartDate, Duration) 
VALUES (1, 1, 'Login', '2016-11-27 01:30:00.000', 86400000),
       (2, 1, 'Login', '2016-11-27 00:30:00.000', 86400000),
       (3, 1, 'Login', '2016-11-27 00:00:00.000', 86400000),
       (4, 1, 'Login', '2016-11-28 23:30:00.000', 172800000),
       (1, 2, 'Lock', '2016-11-27 23:30:00.000', 3600000),
       (4, 2, 'Lock', '2016-11-28 23:30:00.000', 1800000)

As you can see, the Duration is in milliseconds and can span multiple days.

I need to see how many clients were Logged In, and how many clients were Locked, and for how long during a given date range. The date range can span a day/week/month/year, and when it does span more than a day, it should still bucket by half hour.

In the end, the results for a query on the date range 2016-11-28 00:00:00.000 – 2016-11-30 00:00:00.000 would look like so (apologies if I miscalculated the SumDuration or CountClients somewhere, I did this manually but I think it is correct):

EventID  EventName  HalfHour  SumDuration  CountClients
--------------------------------------------------
1        'Login'    0:00      5400000      3 
1        'Login'    0:30      3600000      2 
1        'Login'    1:00      3600000      2
1        'Login'    1:30      1800000      1 
...      ...        ...       ...          ...
1        'Login'    23:00     1800000      1 
1        'Login'    23:30     3600000      1
2        'Lock'     0:00      3600000      2 
2        'Lock'     0:30      1800000      1 
...      ...        ...       ...          ...
2        'Lock'     23:00     1800000      1 
2        'Lock'     23:30     3600000      2 

The ellipses are for the sake of brevity in repeated results, but there should be 48 results per Event (1 for every half hour in a day). You will also notice that the Duration needs to be calculated during that half hour. It is not in the example data, but it is possible to have a Duration end at any time, not just a half hour mark. Last, I should note the result for 'Login' at 23:30. ClientID 4 had a Login event for both 2016-11-28 and 2016-11-29 during the 23:30-0:00 half hour. So, the Duration sums both, but only counts the Client once.

I have the following, which gives me the CountClients and SumDuration, but doesn't break it up into half-hour increments:

select E.EventID,
       E.EventName,
       count(*) as CountClients,
       sum(datediff(millisecond, I.StartDate, I.EndDate)) as SumDuration
from dbo.Events as E
  cross apply (
              select max(T.StartDate) as StartDate,
                     min(T.EndDate) as EndDate
              from (
                   values(@StartDate, @EndDate),
                         (E.StartDate, dateadd(millisecond, E.Duration, E.StartDate))
                   ) as T(StartDate, EndDate)
              ) as I
where E.StartDate < @EndDate and
      dateadd(millisecond, E.Duration, E.StartDate) > @StartDate
group by E.EventID,
         E.EventName;

How would I go about breaking this up into half hour increments?

Best Answer

First, let's look at how to get all the time slices.

DECLARE @FirstDay smalldatetime = '20161128', 
        @LastDay  smalldatetime = '20161129';

SELECT TOP (24*2*DATEDIFF(DAY, @FirstDay, DATEADD(DAY,1,@LastDay)))
  DATEADD(MINUTE, 30*(ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1), @FirstDay)
FROM master..spt_values;

This will give you your set of 96 time slots, in half hour increments, from the 28th at midnight through the 29th at 23:30. A Numbers table is a better option than master.dbo.spt_values, but you can also use sys.all_objects - or any table really that has enough rows to cover your desired date range.

Next, let's see how to find the rows in the Events table that you care about - basically anything that passes through your window.

DECLARE @FirstDay smalldatetime = '20161128', 
        @LastDay  smalldatetime = '20161129';

SELECT EventID, EventName, StartDate, 
  EndDate = DATEADD(MILLISECOND, Duration, StartDate)
FROM dbo.Events
WHERE StartDate >= DATEADD(MILLISECOND, -Duration, @FirstDay) 
  AND StartDate <  DATEADD(DAY, 1, @LastDay);

Now, how to use that to get the results you're after (edited to accommodate durations that do not end on the half hour mark):

DECLARE @FirstDay smalldatetime = '20161128', 
        @LastDay  smalldatetime = '20161129';

;WITH TimeSlots(HalfHour) AS 
(
  SELECT TOP (24*2*DATEDIFF(DAY, @FirstDay, DATEADD(DAY,1,@LastDay)))
    DATEADD(MINUTE, 30*(ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1), @FirstDay)
  FROM master..spt_values
),
EventRange AS
(
  SELECT EventID, EventName, StartDate, 
    EndDate = DATEADD(MILLISECOND, Duration, StartDate)
  FROM dbo.Events
  WHERE StartDate >= DATEADD(MILLISECOND, -Duration, @FirstDay)
    AND StartDate <  DATEADD(DAY, 1, @LastDay)
),
Combo AS
(
  SELECT e.EventID, e.EventName, 
    HalfHourSlot = CONVERT(CHAR(5), CONVERT(datetime, CONVERT(time, t.HalfHour)), 108), 
    CASE
        WHEN e.StartDate > DATEADD(minute, 30, t.HalfHour) OR e.EndDate <= t.HalfHour
            THEN 0
        ELSE
            DATEDIFF(MINUTE, CASE WHEN e.StartDate >= t.HalfHour THEN e.StartDate ELSE t.HalfHour END, CASE WHEN e.EndDate <= DATEADD(minute, 30, t.HalfHour) THEN e.EndDate ELSE DATEADD(minute, 30, t.HalfHour) END)
    END AS Duration
  FROM TimeSlots AS t
  CROSS JOIN EventRange AS e
)
SELECT EventID, EventName, HalfHour = HalfHourSlot,
  SumDuration  = SUM(Duration), 
  CountClients = SUM(CASE WHEN Duration>0 THEN 1 ELSE 0 END) 
FROM Combo
GROUP BY EventID, EventName, HalfHourSlot
ORDER BY EventName DESC, HalfHour;