Sql-server – Determine Gap Records, Fill and Return Result

sql serversql-server-2012

The following working code targets SQL-Server 2012 and I have distilled the actual problem into a manageable scenario with the code example shown below also presented on SQL Fiddle.

I seek feedback on best practices on using CTE's and temp tables and am wondering if the following could be streamlined as such.

(Note this was also asked on Code Review (Insert Of Gap Records Report Process) with no responses so far.)

Premise

Two Tables exist which house timestamped events for Agents and Management. A report will be run on the agent table. But within the Agent table, if there are event gap(s) of more than 5 hours, the resultant report has to be filled from the Management table of events for that gap.

Agent

CREATE TABLE AgentInteractions
(
   [Event]      VARCHAR(12) NOT NULL, 
   [Timestamp] [DateTime] NOT NULL
);


INSERT INTO dbo.AgentInteractions( Event, TimeStamp  )
VALUES ( 'Alpha', '24-Jan-2018 3:04:00 PM' ),
       ( 'Beta', '24-Jan-2018 10:04:00 PM' ),  -- Gap 7 hours
       ( 'Omega', '25-Jan-2018 2:04:00 AM' );  -- No Gap

Management

CREATE TABLE ManagementInteractions
(
  [Event]      VARCHAR(12) NOT NULL, 
  [Timestamp] [DateTime] NOT NULL
);


INSERT INTO dbo.ManagementInteractions( Event, TimeStamp  )
VALUES ( '5pm',      '24-Jan-2018 5:00:00 PM' ),  -- Gap Filler #1
       ( '8pm',      '24-Jan-2018 8:00:00 PM' ),  -- Gap Filler #2
       ( 'Midnight', '25-Jan-2018 12:00:00 AM' ); -- Not used

Initial Report (or first step)

With the following sql which places the report into the temp table, it calculates the time difference between the rows and sets up a sequence which was not present in the originating data.

CTE To Temp Table

IF OBJECT_ID('tempdb..#Actions') IS NOT NULL DROP TABLE #Actions;

WITH AgentActions AS 
( SELECT  ROW_NUMBER() OVER ( ORDER BY [Timestamp] ) AS [Sequence], -- Create an index number ordered by time.
          Event ,
          Timestamp
  FROM  AgentInteractions
)
SELECT  CAST('Agent' AS VARCHAR(20)) AS [Origin] ,
            AgentActions.Sequence ,
            AgentActions.Event ,
            ( SELECT    Other.Timestamp
              FROM      AgentActions Other
              WHERE     Other.Sequence = AgentActions.Sequence - 1
            ) AS Previous ,
            AgentActions.Timestamp ,
            ISNULL(DATEDIFF(HOUR,
                            ( SELECT    Other.Timestamp
                              FROM      AgentActions Other
                              WHERE     Other.Sequence = AgentActions.Sequence - 1), 
                            AgentActions.Timestamp), 
                            0) AS TimeFromLastPoint
    INTO    #Actions
    FROM    AgentActions;

Result of a select into #Actions, notice the 7 hour gap:

enter image description here

Determine Gaps

The following sql determines the gaps, and inserts the records from the Management table which fill the gap.

Insert Management Events Into Gaps

WITH Gaps AS 
( SELECT   AC.Origin ,
           AC.Sequence ,
           AC.Event ,
           AC.Previous ,
           AC.Timestamp ,
           AC.TimeFromLastPoint
  FROM     #Actions AC
  WHERE    AC.TimeFromLastPoint > 5
)
INSERT  INTO #Actions ( [Origin] , [Event] , [Timestamp] , TimeFromLastPoint)
            SELECT  'Management' ,
                    [Event] ,
                    [Timestamp] ,
                    0                      -- We will figure this out after the insert.
            FROM    ManagementInteractions MAN
            WHERE   EXISTS ( SELECT *
                             FROM   Gaps
                             WHERE  MAN.Timestamp BETWEEN Gaps.Previous
                                                  AND     Gaps.Timestamp );

SELECT Origin , Sequence , Event , Previous , Timestamp , TimeFromLastPoint 
FROM #Actions ORDER BY Timestamp;

Result of the select.

enter image description here

Final Report

The final report does what is done in step 1, it re sequences and determines time durations between all points.

CTE Deja-vu Final Report

WITH Combined AS 
( 
   SELECT Origin,
          ROW_NUMBER() OVER ( ORDER BY [Timestamp] ) AS [Sequence], -- Create an index number ordered by time.
          Event ,
          Timestamp
    FROM  #Actions
)
SELECT  Combined.Origin,
        Combined.Sequence ,
        Combined.Event ,
        ( SELECT    Other.Timestamp
          FROM      Combined Other
          WHERE     Other.Sequence = Combined.Sequence - 1
        ) AS Previous ,
        Combined.Timestamp ,
        ISNULL(DATEDIFF(HOUR,
                        ( SELECT    Other.Timestamp
                          FROM      Combined Other
                          WHERE     Other.Sequence = Combined.Sequence
                                    - 1
                        ), Combined.Timestamp), 0) AS TimeFromLastPoint

FROM  Combined;

Here are the results

enter image description here

Summary

Can any of the steps be combined or are there any practices I should use or avoid with the above sql code?

Best Answer

I think that the approach you came up with is a pretty good one. Here is a proposed simplification that avoids the self-join on dbo.AgentInteractions by using the new SQL Server 2012 LAG function as well as CTEs to distill the logic into a single query (also in SQL Fiddle form).

WITH AgentActions AS (
    -- Sequence the agent actions
    SELECT a.Event,
        LAG(a.TimeStamp, 1) OVER (ORDER BY TimeStamp ASC) AS Previous,
        a.TimeStamp,
        DATEDIFF(HOUR, LAG(a.TimeStamp, 1) OVER (ORDER BY TimeStamp ASC), a.TimeStamp) AS TimeFromLastPoint
    FROM dbo.AgentInteractions a
),
allActionsToReport AS (
    --  Agent action
    SELECT 'Agent' AS Origin, a.Event, a.TimeStamp
    FROM AgentActions a
    UNION ALL
    -- Any management actions during a gap of > 5 hours between agent interactions
    SELECT 'Management' AS Origin, m.event, m.TimeStamp
    FROM AgentActions a
    JOIN ManagementInteractions m
        ON m.TimeStamp BETWEEN a.Previous AND a.TimeStamp
    WHERE a.TimeFromLastPoint > 5
)
-- For all actions (with gaps already filled), perform final sequencing
SELECT a.Origin,
    ROW_NUMBER() OVER (ORDER BY a.TimeStamp ASC) AS Sequence,
    a.Event,
    LAG(a.TimeStamp, 1) OVER (ORDER BY TimeStamp ASC) AS Previous,
    a.TimeStamp,
    ISNULL(DATEDIFF(HOUR, LAG(a.TimeStamp, 1) OVER (ORDER BY TimeStamp ASC), a.TimeStamp), 0) AS TimeFromLastPoint
FROM allActionsToReport a

From taking a look at the query plan for this single-query version, it does exactly what you'd expect: sort the agent actions by timestamp, fill in the gaps by accessing the management actions, and then sort this combined set of resulting actions in order to do the final sequencing:

enter image description here