SQL Server – Counting MAX Event Simultaneously Using Tally Table

aggregatesql serversql-server-2019stored-procedures

Hi,

I have a task that is meant to summarize several connections coming in to a given set of ports (trunks), into timeslots and count the results. The thing is that I would like to achieve two type of counts:

One count should count the total number of connection to a Trunk in the given slot (30 minutes per slot)
Secondly I would like to find out the MAX number of connection that occurred simultaneously: such as there have been 10 connections totally however only 3 of them was connected on simultaneously

Table: RAW_DATA

 
GatewayName     StartDateTime                   DisconnectDateTime              ConnectionDuration  Trunk
GW1             2021-02-24 20:01:00.0000000     2021-02-24 20:05:30.0000000     270000              T1  -- 1 *
GW1             2021-02-24 20:05:20.6000000     2021-02-24 20:07:50.1000000     149500              T1  -- 2 *
GW1             2021-02-24 20:04:50.0000000     2021-02-24 20:08:24.0000000     214000              T1  -- 3 *
GW1             2021-02-24 20:15:50.0000000     2021-02-24 20:17:00.0000000     70000               T1  -- 0
GW1             2021-02-24 20:20:50.0000000     2021-02-24 20:21:00.0000000     10000               T1  -- 1
GW1             2021-02-24 20:20:59.0000000     2021-02-24 20:24:00.0000000     181000              T1  -- 2
GW1             2021-02-24 20:25:00.0000000     2021-02-24 20:28:30.0000000     210000              T1  -- 0
GW2             2021-02-24 20:41:49.0000000     2021-02-24 20:43:24.0000000     95000               T2
GW3             2021-02-24 22:46:54.2000000     2021-02-24 22:48:25.2000000     91000               T1
GW99            2021-02-24 22:47:25.1000000     2021-02-24 22:47:54.4000000     29300               T2

The source data table is depicted above, I added some comments on the first few lines indicating how I would count. So as you can see the first three lines occur simultaneously (at leas parts of it) then on the forth line I start over (indicated with a 0 here). So what I would like is to save the three indicated with *to be the "winner" giving a count Simultaneous = 3

I tried to add it into a store procedure, I have been able to get the tally table and counting the 'ALL' , however I'm clueless when it comes to counting the Simultaneous connections.

CREATE OR ALTER PROCEDURE [schemma].[TrunkSummery]
@date datetime2(7),
@period int
AS
BEGIN
    -- Creat timeslotable
        with numbers(val) as 
            (select 1 union all select val + 1 from numbers where val < 48) -- 48 hence 30 * 2 = 60 min -> 24h
        select @date, nbr.val, 
           dateadd(minute, (nbr.val - 1) * 30, @date) as period_start, 
           dateadd(minute, (nbr.val    ) * 30, @date) as period_end 
        from numbers as nbr 
        order by nbr.val;
        
    -- Enummurate over timeslot table and RAW_DAtA
        with numbers(val) as 
            (select 1 union all select val + 1 from numbers where val < 48), -- 48 hence 30 * 2 = 60 min -> 24h
        periods as (
            select @date as [date], nbr.val, 
           dateadd(minute, (nbr.val - 1) * 30, @date) as period_start, 
           dateadd(minute, (nbr.val    ) * 30, @date) as period_end 
           from numbers as nbr)
        select pers.period_start, @period as Period, src.trunk, count(src.trunk) as 'all' 
        /*, count(Simultaneous) as 'sim' */ 
        --from periods as pers left  join @raw_data as src 
        from periods as pers inner join @raw_data as src 
        on src.StartDateTime >= pers.period_start and src.StartDateTime < pers.period_end
        group by src.trunk, pers.period_start, pers.period_end
        order by src.trunk;
END
GO

EXECUTE [schemma].[TrunkSummery] @date = '20210224', @period = 30;
GO

Output tables

-- Current
period_start                    Period  trunk   all
2021-02-24 20:00:00.0000000     30      T1      7
2021-02-24 22:30:00.0000000     30      T1      1
2021-02-24 20:30:00.0000000     30      T2      1
2021-02-24 22:30:00.0000000     30      T2      1

-- Desiered
period_start                    Period  trunk   all  sim
2021-02-24 20:00:00.0000000     30      T1      7    3   
2021-02-24 22:30:00.0000000     30      T1      1    1
2021-02-24 20:30:00.0000000     30      T2      1    1
2021-02-24 22:30:00.0000000     30      T2      1    1

Is this possible somehow?

Fiddle example here

Thanks to everyone in advance 🙂

update 1

After thinking reading up a bit more, maybe I could use a ‘LEAD’ function somehow to see if ‘disconnectDateTime’ on current row is less then ‘StartDateTime’ on the row below it an store the number of hops until ‘DisconnectDateTime’ is less than ‘StartDateTime’.
And continue on like this for all rows within the slot. And if a clause surpass the last MAX count update it with the new record.

Update 2

I have added a temp table that is run before I run the timeslot sorting, the purpus of this table is to look at the next rows StartDateTimeto see if it is before current rows DisconnectDateTime

GatewayName StartDateTime           DisconnectDateTime      ConDur  Trunk   nrDDT                   sim
GW1         2021-02-24 20:01:00.0   2021-02-24 20:05:30.0   270000  T1      2021-02-24 20:04:50.0   2 -- OK 
GW1         2021-02-24 20:04:50.0   2021-02-24 20:08:24.0   214000  T1      2021-02-24 20:05:20.6   2 -- Bad: 3
GW1         2021-02-24 20:05:20.6   2021-02-24 20:07:50.1   149500  T1      2021-02-24 20:15:50.0   0
GW1         2021-02-24 20:15:50.0   2021-02-24 20:17:00.0   70000   T1      2021-02-24 20:20:50.0   0
GW1         2021-02-24 20:20:50.0   2021-02-24 20:21:00.0   10000   T1      2021-02-24 20:20:59.0   2 -- OK
GW1         2021-02-24 20:20:59.0   2021-02-24 20:24:00.0   181000  T1      2021-02-24 20:25:00.0   0
GW1         2021-02-24 20:25:00.0   2021-02-24 20:28:30.0   210000  T1      2021-02-24 22:46:54.2   0
GW3         2021-02-24 22:46:54.2   2021-02-24 22:48:25.2   91000   T1      NULL                    0
GW2         2021-02-24 20:41:49.0   2021-02-24 20:43:24.0   95000   T2      2021-02-24 22:47:25.1   0
GW99        2021-02-24 22:47:25.1   2021-02-24 22:47:54.4   29300   T2      NULL                    0

The issue is that I would like to increase a count if my case is met (sim column), I have tried with both a global and a local variable however it is resetting for every row and I can't force it back to 0if my else clause is entered.

CREATE OR ALTER PROCEDURE GenerateTrunkSum
@date datetime2(7),
@period int
AS
BEGIN

DECLARE @raw_data table
(
GatewayName varchar(23),
StartDateTime datetime2(7),
DisconnectDateTime datetime2(7),
ConnectionDuration int ,
Trunk varchar(10)
);

INSERT INTO @raw_data values('GW1', '2021-02-24 20:01:00.0', '2021-02-24 20:05:30.0', DATEDIFF(millisecond, '2021-02-24 20:01:00.0', '2021-02-24 20:05:30.0'), 'T1')
INSERT INTO @raw_data values('GW1', '2021-02-24 20:05:20.6', '2021-02-24 20:07:50.1', DATEDIFF(millisecond, '2021-02-24 20:05:20.6', '2021-02-24 20:07:50.1'), 'T1')
INSERT INTO @raw_data values('GW1', '2021-02-24 20:04:50.0', '2021-02-24 20:08:24.0', DATEDIFF(millisecond, '2021-02-24 20:04:50.0', '2021-02-24 20:08:24.0'), 'T1')
INSERT INTO @raw_data values('GW1', '2021-02-24 20:15:50.0', '2021-02-24 20:17:00.0', DATEDIFF(millisecond, '2021-02-24 20:15:50.0', '2021-02-24 20:17:00.0'), 'T1')
INSERT INTO @raw_data values('GW1', '2021-02-24 20:20:50.0', '2021-02-24 20:21:00.0', DATEDIFF(millisecond, '2021-02-24 20:20:50.0', '2021-02-24 20:21:00.0'), 'T1')
INSERT INTO @raw_data values('GW1', '2021-02-24 20:20:59.0', '2021-02-24 20:24:00.0', DATEDIFF(millisecond, '2021-02-24 20:20:59.0', '2021-02-24 20:24:00.0'), 'T1')
INSERT INTO @raw_data values('GW1', '2021-02-24 20:25:00.0', '2021-02-24 20:28:30.0', DATEDIFF(millisecond, '2021-02-24 20:25:00.0', '2021-02-24 20:28:30.0'), 'T1')
INSERT INTO @raw_data values('GW2', '2021-02-24 20:41:49.0 ', '2021-02-24 20:43:24.0', DATEDIFF(millisecond, '2021-02-24 20:41:49.0 ', '2021-02-24 20:43:24.0'), 'T2')
INSERT INTO @raw_data values('GW3', '2021-02-24 22:46:54.2', '2021-02-24 22:48:25.2', DATEDIFF(millisecond, '2021-02-24 22:46:54.2', '2021-02-24 22:48:25.2'), 'T1')
INSERT INTO @raw_data values('GW99', '2021-02-24 22:47:25.1', '2021-02-24 22:47:54.4', DATEDIFF(millisecond, '2021-02-24 22:47:25.1', '2021-02-24 22:47:54.4'), 'T2')


declare @localvar int -- Alt 2
set @localvar = 1
   SELECT GatewayName, StartDateTime, DisconnectDateTime, ConnectionDuration, Trunk, LEAD(StartDateTime, 1, NULL) OVER ( PARTITION BY Trunk ORDER BY  StartDateTime ) as nrDDT,
       CASE 
           WHEN DATEDIFF(MILLISECOND, LEAD(StartDateTime, 1, NULL) OVER ( PARTITION BY Trunk ORDER BY  StartDateTime ), DisconnectDateTime) >= 0 THEN @localvar + 1 -- Add if Match 1 = 1 M3 = 3 etc
           ELSE 0 -- Reset @localvar = 0
       END AS sim
   INTO #Temp
   FROM @raw_data;

       select * from #Temp;
   
   -- Creat timeslotable
       with numbers(val) as 
           (select 1 union all select val + 1 from numbers where val < 48)
       select @date, nbr.val, 
           dateadd(minute, (nbr.val - 1) * 30, @date) as period_start, 
           dateadd(minute, (nbr.val    ) * 30, @date) as period_end 
       from numbers as nbr 
       order by nbr.val;

       --Enummerate
       with numbers(val) as 
           (select 1 union all select val + 1 from numbers where val < 48),
       periods as (
           select @date as [date], nbr.val, 
           dateadd(minute, (nbr.val - 1) * 30, @date) as period_start, 
           dateadd(minute, (nbr.val    ) * 30, @date) as period_end 
           from numbers as nbr)
       select pers.period_start, @period as Period, src.trunk, count(src.GatewayName) as 'all', 
           -- Case Added in update 2
           CASE
               WHEN MAX(src.sim) < 1 THEN 1 -- if max is 0 set 1, defaults to at least one active
               ELSE MAX(src.sim)
           END AS simultaneous
       --from periods as pers left  join HDO.CDR_RAW as src 
       from periods as pers inner join #Temp as src 
       on src.StartDateTime >= pers.period_start and src.StartDateTime < pers.period_end
       group by src.trunk, pers.period_start
       order by src.trunk 
END
GO

EXECUTE GenerateTrunkSum @date = '20210224', @period = 1800;

So my question is: Does someone know how-to get this count mechanic to work ? The reason for why I want it like this is to be able to do a MAXwithin my last select statement (see CASE)

Update 3

Once I checked the current answer below by @bbaird, with a larger more true dataset; StartDateTime & DisconnectDateTime could overflow into a different period.

Updated Fiddle

Current output

period_start            period_end              Period  Trunk   All Sim
2021-02-24 20:00:00.0   2021-02-24 20:30:00.0   30      E1      2   4
2021-02-24 20:30:00.0   2021-02-24 21:00:00.0   30      E1      23  6
2021-02-24 20:30:00.0   2021-02-24 21:00:00.0   30      E2      1   1
2021-02-24 20:00:00.0   2021-02-24 20:30:00.0   30      T1      2   4
2021-02-24 20:30:00.0   2021-02-24 21:00:00.0   30      T1      23  6
2021-02-24 20:30:00.0   2021-02-24 21:00:00.0   30      T2      1   1

As you can see the period 2021-02-24 20:00:00.0 for Trunk = 'E1'has a total count of 2 but a Sim of 4. I can't figure out why this is.

If we look at the rows surrounding 2021-02-24 20:00:00.0 for E1

 -- @raw_data
 1 ('GW1', '2021-02-24 20:23:43.1', '2021-02-24 20:32:32.9', ..., 'T1', 'E1' ), 
 2 ('GW1', '2021-02-24 20:24:34.6', '2021-02-24 20:33:05.1', ..., 'T1', 'E1' ), 
 3 ('GW1', '2021-02-24 20:31:09.5', '2021-02-24 20:32:51.4', ..., 'T1', 'E1' ),
 4 ('GW1', '2021-02-24 20:32:42.3', '2021-02-24 20:33:09.3', ..., 'T1', 'E1' ), 
 5 ('GW1', '2021-02-24 20:34:50.6', '2021-02-24 20:37:30.9', ..., 'T1', 'E1' ), 
 6 ('GW1', '2021-02-24 20:35:34.2', '2021-02-24 20:36:25.4', ..., 'T1', 'E1' ),

Looking at the data, I would say that the row 1 and 2 should be the only one that are counted(count(*) as 'ALL') for sim T:20:00 as et seems based on the output shown above however, I'm not sure where the extra two Simcounts comes from. Based on some analyses I have done of the output, It could be Connectionsummary.ConnectionCountfrom ,MAX(COALESCE(ConnectionSummary.ConnectionCount,0)) AS Sim since ConnectionSummaryseams to have consecutive rows even after a 30 min mark is meet; I assume that this needs to be group into timeslots before enumerating it.

Secondly it becomes more trickery when it comes to 20:30 for a solid query I assume that row 1 and 2 should probably be counted as part of sim since row 3 occurs simultaneously. Is there any possibility to count entries twice if the surpasses a 30 min-mark(00/30) ?

Best Answer

We can get the number of connections active for each trunk/period/interval combination like so:

WITH Numbers(val) AS
(
  SELECT
    1 
    
    UNION ALL
    
  SELECT
    val + 1 
  FROM
    numbers 
  WHERE
    val < 48
)
,ConnectionPeriod AS
(
  SELECT
    period.period_start
   ,period.period_end
   ,@period as Period
   ,src.Trunk
   ,CASE
      WHEN src.StartDateTime < period.period_start THEN period.period_start
      ELSE src.StartDateTime
    END AS StartDateTime
   ,CASE
      WHEN src.DisconnectDateTime > period.period_end THEN period.period_end
      ELSE src.DisconnectDateTime
    END AS DisconnectDateTime
  FROM
    (
      SELECT
        dateadd(minute, (val - 1) * 30, @date) as period_start
       ,dateadd(minute, (val    ) * 30, @date) as period_end 
      FROM
        numbers
    ) period 
  INNER JOIN
    (
      SELECT 
        TrunkIngress as Trunk
       ,StartDateTime
       ,DisconnectDateTime
      FROM 
        @raw_data

    UNION ALL

      SELECT 
        TrunkEgress
       ,StartDateTime
       ,DisconnectDateTime
      FROM 
        @raw_data
    ) src
      ON src.StartDateTime >= period.period_start
          AND src.StartDateTime < period.period_end
)
SELECT
  cp1.period_start
 ,cp1.Trunk
 ,cp1.StartDateTime
 ,cp1.DisconnectDateTime
 ,COUNT(*) AS ConnectionCount
FROM
  ConnectionPeriod cp1
LEFT JOIN
  ConnectionPeriod cp2
    ON cp2.Trunk = cp1.Trunk
        AND cp2.StartDateTime < cp1.DisconnectDateTime
        AND cp2.DisconnectDateTime > cp1.StartDateTime
GROUP BY
  cp1.period_start
 ,cp1.Trunk
 ,cp1.StartDateTime
 ,cp1.DisconnectDateTime

Since you only want to count duplicate connections that begin on/after a given period, we first join each Connection to Period to create these combinations. Then we adjust the StartDateTime and DisconnectDateTime to be within each Period.

It's then a matter of joining this to your penultimate aggregation and taking the max of ConnectionCount:

WITH Numbers(val) AS
(
  SELECT
    1 
    
    UNION ALL
    
  SELECT
    val + 1 
  FROM
    numbers 
  WHERE
    val < 48
)
,ConnectionPeriod AS
(
  SELECT
    period.period_start
   ,period.period_end
   ,@period as Period
   ,src.Trunk
   ,CASE
      WHEN src.StartDateTime < period.period_start THEN period.period_start
      ELSE src.StartDateTime
    END AS StartDateTime
   ,CASE
      WHEN src.DisconnectDateTime > period.period_end THEN period.period_end
      ELSE src.DisconnectDateTime
    END AS DisconnectDateTime
  FROM
    (
      SELECT
        dateadd(minute, (val - 1) * 30, @date) as period_start
       ,dateadd(minute, (val    ) * 30, @date) as period_end 
      FROM
        numbers
    ) period 
  INNER JOIN
    (
      SELECT 
        TrunkIngress as Trunk
       ,StartDateTime
       ,DisconnectDateTime
      FROM 
        @raw_data

    UNION ALL

      SELECT 
        TrunkEgress
       ,StartDateTime
       ,DisconnectDateTime
      FROM 
        @raw_data
    ) src
      ON src.StartDateTime >= period.period_start
          AND src.StartDateTime < period.period_end
)
SELECT
  PeriodSummary.period_start
 ,PeriodSummary.Period
 ,PeriodSummary.Trunk
 ,PeriodSummary.[all]
 ,MAX(COALESCE(ConnectionSummary.ConnectionCount,0)) AS sim
FROM
  (
    SELECT
      period_start
     ,period_end
     ,@period as Period
     ,Trunk
     ,COUNT(*) as [all]
    FROM
      ConnectionPeriod
    GROUP BY
     Trunk
    ,period_start
    ,period_end
  ) PeriodSummary
INNER JOIN
  (
    SELECT
      cp1.period_start
     ,cp1.Trunk
     ,cp1.StartDateTime
     ,cp1.DisconnectDateTime
     ,COUNT(*) AS ConnectionCount
    FROM
      ConnectionPeriod cp1
    LEFT JOIN
      ConnectionPeriod cp2
        ON cp2.Trunk = cp1.Trunk
            AND cp2.StartDateTime < cp1.DisconnectDateTime
            AND cp2.DisconnectDateTime > cp1.StartDateTime
    GROUP BY
      cp1.period_start
     ,cp1.Trunk
     ,cp1.StartDateTime
     ,cp1.DisconnectDateTime
  ) ConnectionSummary
    ON ConnectionSummary.Trunk = PeriodSummary.Trunk
         AND ConnectionSummary.period_start =  PeriodSummary.Period_Start
GROUP BY
  PeriodSummary.period_start
 ,PeriodSummary.Period
 ,PeriodSummary.Trunk
 ,PeriodSummary.[all]

Updated Fiddle here: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=317652b78da1165c60b3f0c1f3055107

One other thing to consider, I'm not sure if you want to count connections that started during a specific interval, or connections that were active during that interval. If the latter, you'll need to change your join logic to:

src.StartDateTime < pers.period_end
  AND src.DisconnectDateTime > pers.period_start

And

ConnectionSummary.Trunk = PeriodSummary.Trunk
  AND ConnectionSummary.StartDateTime < PeriodSummary.Period_End  
  AND ConnectionSummary.DisconnectDateTime > PeriodSummary.Period_Start

Respectively.

This query will probably run fine with a small number of rows, but given the lack of good join criteria it may grind a bit considering it has to join the source table to the periods three times. In that case, it may make sense to replace the ConnectionPeriod CTE with a temporary table.

update 1

Update 2

Update 3

Best Answer

Related Solutions

MAX for each subset

How to Avoid Using Variables in SQL Server WHERE Clause

Related Question