Sql-server – the best way to get all data for a date range, plus the last event just before the range

sql serversql-server-2008-r2t-sql

For this question assume the following:

  • There is a table of events [Event]
  • There is a table of employees [Employee]
  • There is a table that relates the two [EventEmployee] (EventID + EmployeeID)
  • Each event is assigned to n employees
  • These events can occur at random dates/times

I need to query for the events that start and/or end within a specified time frame. I also need to include the first event prior to the specified date range PER EMPLOYEE assigned to the event.

For example, if I query from 03/01/2014 to 04/01/2014 and I return 15 events with 5 employees assigned to each. I also need to return the most recent event prior to 03/01/2014 for each of those employees.

I am currently accomplishing this by:

  • Getting all of my events for the date range into a temp table
  • Selecting everything out of the temp table
  • UNION with a query that gets the DISTINCT [EmployeeID]s based on the temp table and does a CROSS APPLY on [Event] joined with [EventEmployee] to get the most recent event prior to the start date of the range.

Is there a more efficient way to accomplish this task?

Please tell me if some sample code would help the explanation.

EDIT:
My Query:

Assuming @StartDate and @EndDate are DATETIME parameters,

-- Get all events that start or end within a given time frame
SELECT
    e.[EventID],
    e.[EventStart],
    e.[EventEnd],
    e.[EventTypeID]
INTO #EventTemp
FROM
    [Events] AS e
WHERE
    ( e.[EventStart] >= @StartDate AND e.[EventStart] <= @EndDate )
    OR ( e.[EventEnd] >= @StartDate AND e.[EventEnd] <= @EndDate )

SELECT
    #EventTemp.[EventID],
    #EventTemp.[EventStart],
    #EventTemp.[EventEnd],
    #EventTemp.[EventTypeID]
FROM
    #EventTemp
UNION
SELECT
    PrevEvent.[EventID],
    PrevEvent.[EventStart],
    PrevEvent.[EventEnd],
    PrevEvent.[EventTypeID]
FROM
    -- Get a DISTINCT list of employeeIDs from those events
    (
        SELECT DISTINCT
            ee.[EmployeeID]
        FROM
            #EventTemp
            INNER JOIN [EventEmployee] AS ee ON
                #EventTemp.[EventID] = ee.[EventID]
    ) AS Emp
    -- Get the most recent prior event for each distinct employeeID    
    CROSS APPLY (
        SELECT TOP 1
            e.[EventID],
            e.[EventStart],
            e.[EventEnd],
            e.[EventTypeID]
        FROM
            [Events] AS e
            INNER JOIN [EventEmployee] AS ee ON
                e.[EventID] = ee.[EventID]
        WHERE
            ee.[EmployeeID] = Emp.[EmployeeID]
            AND e.[EventEnd] < @StartDate
        ORDER BY 
            e.[EventEnd] DESC
    ) AS PrevEvent

Lets say we end up with 4 events in the temp table #EventsTemp. Each of those events could be tied to any number of employees. For this example, lets assume between all the events in #EventsTemp we are dealing with 3 employees.

For the final select I would want to get the list of events from #TempEvents, in addition I would also want the most recent event, prior to @StartDate, that each of those 3 employees were at. If 2 of the employees were at the same event I would expect to get a single extra row for that event, plus an extra row for the 3rd employee:

EventID,EventStart,EventEnd,EventType
5,'2014-04-01 13:00','2014-04-01 15:00',1
6,'2014-04-01 16:00','2014-04-01 19:00',1
7,'2014-04-02 10:00','2014-04-02 14:00',1
8,'2014-04-02 16:00','2014-04-02 18:00',1
2,'2014-03-26 13:00','2014-03-26 15:00',1
1,'2014-03-25 10:00','2014-03-25 14:00',1

In this example, the first four rows were straight from #EventsTemp, and the last two rows are the distinct prior events attended by any combination of the employees that attended the original 4 events from #EventsTemp

Best Answer

I am going to assume that there isn't an index on the date columns, otherwise I think that the query would have been structured differently. If there is, you can probably find a better performing one than this.

The advantage of this query is that it can get all the data in one scan. The disadvantage is that it has to sort the data and join EventEmployee on the entire table. So as always, test with your own situation. This query also assumes that the MAX date is either unique or that equivalent rows would be acceptable.

USE AdventureWorks2012
GO
;
WITH Base AS (
   SELECT 
      TransactionHistory.*
      ,ProductVendor.BusinessEntityID
      ,MAX(CASE WHEN TransactionDate < '2008-08-01' THEN TransactionDate END) 
           OVER (PARTITION BY ProductVendor.BusinessEntityID) AS PreviousVendorTransaction
      ,COUNT(CASE WHEN TransactionDate >= '2008-08-01' THEN 1 END ) 
           OVER (PARTITION BY ProductVendor.BusinessEntityID) AS VendorAfterCutoff
   FROM
      Production.TransactionHistory
      -- Doesn't make the most sense, but I need a repeating relation
      INNER JOIN Purchasing.ProductVendor
         ON TransactionHistory.ProductID = ProductVendor.ProductID
),
Filtered AS (
   SELECT
      *
   FROM
      Base
   WHERE
      Base.TransactionDate >= '2008-08-01'
      OR (TransactionDate = PreviousVendorTransaction AND VendorAfterCutoff > 0)
)
SELECT DISTINCT
   TransactionID
   ,ProductID
   ,ReferenceOrderID
   ,ReferenceOrderLineID
   ,TransactionDate
   ,TransactionType
   ,Quantity
   ,ActualCost
   ,ModifiedDate
FROM
   Filtered

Edit:

Hmm, I think I may have to take back my comment on structuring it differently if there are indexes. The other suggestions that I have are probably fairly minor.

  • Make sure the query is using the indexes you're expecting it to. Start and End date to build temp table, end date to drive the previous event loop.
  • If the query to build the temp table is doing a lookup on the clustered index, it may be better to hold off and do that as part of the main query.
  • Try using a cte instead of a temp table. I think that a cte might be more competitive with the way that the query is structured below.
  • If you are returning a lot of events, it might be better to pull out the event table lookup to the main query to give the optimizer the option of doing a merge join.
  • I don't see a way of optimizing the previous event lookup short of an indexed view.

Here's a query that combines a few of those ideas.

SELECT
    e.[EventID]
INTO #EventTemp
FROM
    [Events] AS e
WHERE
    ( e.[EventStart] >= @StartDate AND e.[EventStart] <= @EndDate )
    OR ( e.[EventEnd] >= @StartDate AND e.[EventEnd] <= @EndDate )

;
WITH PrevEvent AS (
    SELECT
        EmpPrevEvent.[EventID]
    FROM
    (
        SELECT DISTINCT
            ee.[EmployeeID]
        FROM
            #EventTemp
            INNER JOIN [EventEmployee] AS ee ON
                #EventTemp.[EventID] = ee.[EventID]
    ) AS Emp
    CROSS APPLY (
        SELECT TOP 1
            e.[EventID]
        FROM
            [Events] AS e
            INNER JOIN [EventEmployee] AS ee ON
                e.[EventID] = ee.[EventID]
        WHERE
            ee.[EmployeeID] = Emp.[EmployeeID]
            AND e.[EventEnd] < @StartDate
        ORDER BY 
            e.[EventEnd] DESC
    ) AS EmpPrevEvent
)
SELECT
    e.[EventID],
    e.[EventStart],
    e.[EventEnd],
    e.[EventTypeID]
FROM
    [Events] AS e
WHERE
    e.EventID IN (
        SELECT EventID
        FROM #EventTemp
        UNION
        SELECT EventID
        FROM PrevEvent
    )