Sql-server – How to compute number of concurrent application sessions using SQL Server table storing session data

performancequery-performancesql serverstored-procedures

I have a vendor application which stores session data in a SQL Server 2008 database table. Columns it includes are things like the sessionId, the user's IP address, the datetime stamp when the session was created (i.e. user logs in), and the datetime stamp of when the session was destroyed (i.e. user logs off or system logs off user).

My objective is to analyze all the records in this table and figure out the average number of concurrent sessions across all records.

Now, unfortunately, the session destroy date is not accurate, for reasons outside the scope of this question. Therefore, I'm using a very rough estimate for the duration of the session: 1 hour. And I can always change the number later after I have the design phase sorted out.

I'm certain I can put together a stored procedure to get the average number of concurrent sessions, but I was hoping I can accomplish it with a query.

To simplify things here, let's assume there are 5 records in the table and all were created on the same day and are in GMT time:

sessionId  sessionStart  sessionEnd  Accumulative # of Concurrent Sessions
1          12:00         13:00       1
2          12:15         13:15       2
3          12:30         13:30       3
4          12:45         13:45       4
5          13:00         14:00       4

At 13:00, the first session is destroyed. The number of concurrent sessions stays at 4, since sessions 2 through 5 still exist.

The question is how I can write a query which will output the average number of concurrent sessions? Can it be done? I imagine it would involve multiple joins on the same table, but I haven't quite figured out yet where to start.

The table has under a million records. I have access to a 2012 box and can copy the table there if it helps.

Best Answer

I find SQL Server 2012 to be a much better fit for this kind of problem because it supports ORDER BY in the OVER clause for the SUM window aggregate. Throwing your data into a temp table:

CREATE TABLE #my_sessions (sessionId INT, sessionStart DATETIME);

INSERT INTO #my_sessions VALUES
(1, '20180413 12:00:00'),
(2, '20180413 12:15:00'),
(3, '20180413 12:30:00'),
(4, '20180413 12:45:00'),
(5, '20180413 13:00:00');

I'll break the query into three parts so it's easier to understand. The first trick uses a running total to get the number of concurrent sessions at each time it changes. Imagine taking your data, assigning a 1 for the rows that create a session, and making a second copy of it for the destroyed rows and assigning a -1 for those rows. If you calculate a running total ordered by time you end up with the number of active sessions at each time the value changes.

SELECT DISTINCT
  event_time
, SUM(event_change) OVER (ORDER BY event_time RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) active_sessions
FROM #my_sessions
CROSS APPLY (
    VALUES
    (sessionStart, 1),
    (DATEADD(HOUR, 1, sessionStart), -1)
) ca (event_time, event_change);

Here are the results:

╔═════════════════════════╦═════════════════╗
║       event_time        ║ active_sessions ║
╠═════════════════════════╬═════════════════╣
║ 2018-04-13 12:00:00.000 ║               1 ║
║ 2018-04-13 12:15:00.000 ║               2 ║
║ 2018-04-13 12:30:00.000 ║               3 ║
║ 2018-04-13 12:45:00.000 ║               4 ║
║ 2018-04-13 13:00:00.000 ║               4 ║
║ 2018-04-13 13:15:00.000 ║               3 ║
║ 2018-04-13 13:30:00.000 ║               2 ║
║ 2018-04-13 13:45:00.000 ║               1 ║
║ 2018-04-13 14:00:00.000 ║               0 ║
╚═════════════════════════╩═════════════════╝

Now we need to take the average of that value. I assume that you want a weighted average by time, so what's missing is the number of minutes that each measurement should count for. SQL Server 2012 introduces the LEAD function which makes this pretty easy. Now the query is as follows:

SELECT
  active_sessions
, DATEDIFF(MINUTE, event_time, LEAD(event_time) OVER (ORDER BY event_time)) minutes_until_change
FROM 
(
    SELECT DISTINCT
      event_time
    , SUM(event_change) OVER (ORDER BY event_time RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) active_sessions
    FROM #my_sessions
    CROSS APPLY (
        VALUES
        (sessionStart, 1),
        (DATEADD(HOUR, 1, sessionStart), -1)
    ) ca (event_time, event_change)
) active_sessions

The intermediate result set:

╔═════════════════╦══════════════════════╗
║ active_sessions ║ minutes_until_change ║
╠═════════════════╬══════════════════════╣
║               1 ║ 15                   ║
║               2 ║ 15                   ║
║               3 ║ 15                   ║
║               4 ║ 15                   ║
║               4 ║ 15                   ║
║               3 ║ 15                   ║
║               2 ║ 15                   ║
║               1 ║ 15                   ║
║               0 ║ NULL                 ║
╚═════════════════╩══════════════════════╝

We need to calculate the average which is the easiest part. Putting it all together:

SELECT 1.0 * SUM(active_sessions * minutes_until_change) / SUM(minutes_until_change)
FROM
(
    SELECT
      active_sessions
    , DATEDIFF(MINUTE, event_time, LEAD(event_time) OVER (ORDER BY event_time)) minutes_until_change
    FROM 
    (
        SELECT DISTINCT
          event_time
        , SUM(event_change) OVER (ORDER BY event_time RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) active_sessions
        FROM #my_sessions
        CROSS APPLY (
            VALUES
            (sessionStart, 1),
            (DATEADD(HOUR, 1, sessionStart), -1)
        ) ca (event_time, event_change)
    ) active_sessions
) average_me
WHERE minutes_until_change IS NOT NULL;

The final result is 2.5.

Apparently, this can also be done in SQL Server 2008.

Best Answer

Related Solutions

Sql-server – SQL Server clustered index, index balancing and insert performance using NewID

Sql-server – How to find out how much i/o resources a task, session, request or transaction is currently using

Related Question