Sql-server – Merging continous date range using T-SQL

coalescesql serversql-server-2008t-sql

I have a table containing a part time working pattern consisting of start and end dates together with hours values.

Where there is an gap between an end date and a subsequent start date, this indicates the member was full time.

Where the most recent entry has a NULL end date, this indicates the part time work is still ongoing. A NULL enddte should be a day before the subsequent start date, if one is present.

I'd like to merge the date ranges where there are no gaps (ie, subsequent row start date is 1 day ahead of previous row start date) and where the hrs1 and hrs2 values match. Other solutions for similar problems don't correctly handle the NULL in the final row.

CREATE TABLE #hours(
STARTDTE DATETIME,
ENDDTE DATETIME,
hrs1 DECIMAL(6,2),
hrs2 DECIMAL(6,2))

INSERT INTO #hours VALUES
('25 MAY 1990','30 NOV 1994',18,36),
('01 DEC 1994','31 DEC 1994',18,36),
('01 JAN 2000', NULL,20,36),
('16 JUN 2002','20 APR 2007',18,36),
('10 OCT 2008', NULL,20,36)

The expected output would be as follows:

25 MAY 1990 | 31 DEC 1994 | 18 | 36
01 JAN 2000 | 15 JUN 2002 | 20 | 36
16 JUN 2002 | 20 APR 2007 | 18 | 36
10 OCT 2008 | NULL        | 20 | 36

This query needs to be compatible with SQL Server 2008 so unfortunately lead/lag aren't an option to fill in any null end dates.

Best Answer

Assuming hours are added to table in chronological order. If not, sort them before inserting them!

First we need to relate each record to successive records.
You say that is done by matching their hrs1 & hrs2, and making sure the initial record ends one day before the next starts. Your sample data only includes two records that are sequential, so I added a third (startdte='1995-01-01') to show this works in such a case. If we only expected two records to be part of the same sequence, we could just use a join. But if an indefinite number may be part of the same sequence, then we must use recursion. In this case, a CTE. The first part of the UNION gets the parent records, and the second part of the UNION gets all children. To uniquely track which sequence is which, we use the 'chain' field as a sort of bread crumb trail. Finally, we query the root ID (which we tracked as we went along) and the 'chain' field to find the first and last record in each sequence. Then we join it all back to the original data. Last, we clean up null 'ENDDTE's for full time records.

declare @hours table (
ID int IDENTITY(1,1), -- we need a unique identifier to easily reference each row.
RootID int,
STARTDTE DATETIME,
ENDDTE DATETIME,
hrs1 DECIMAL(6,2),
hrs2 DECIMAL(6,2))

INSERT @hours (STARTDTE,ENDDTE,hrs1,hrs2) VALUES
('25 MAY 1990','30 NOV 1994',18,36),
('01 DEC 1994','31 DEC 1994',18,36),
('1 JAN 1995','1 FEB 1995',18,36),
('01 JAN 2000', NULL,20,36),
('16 JUN 2002','20 APR 2007',18,36),
('10 OCT 2008', NULL,20,36)

;with
hours (chain,recordsInChain,ID, ParentID, RootID,STARTDTE,ENDDTE,hrs1,hrs2) AS (
    select 
        convert(varchar(1000),convert(varchar(10),hc.id)+','), -- This gives us a snapshot of the entire series of records to which this record is related.
        1, hc.ID, hc.ID ,hc.ID, hc.STARTDTE, 
        hc.ENDDTE, 
        hc.hrs1, hc.hrs2
    from 
        @hours hc 
        LEFT OUTER JOIN @hours hp on hc.hrs1=hp.hrs1 and hc.hrs2=hp.hrs2 and hc.STARTDTE = dateadd(dd,1,hp.ENDDTE) -- Record starts on the day after previous record ends.          
    where hp.id is null -- We want root records.  Not records with parents.
    UNION ALL
    select convert(varchar(1000),chain +convert(varchar(10),hc.id)+ ','),recordsInChain +1,  hc.ID, hp.ID ParentID, 
        convert(int,left(hp.chain,charINDEX(',',hp.chain)-1))
        , hp.STARTDTE, hc.ENDDTE, hc.hrs1, hc.hrs2
    from 
        @hours hc 
        INNER JOIN hours hp on (hc.hrs1=hp.hrs1 and hc.hrs2=hp.hrs2 and hc.STARTDTE = dateadd(dd,1,hp.enddte))
        -- We need the end date for fulltime records, but this next line is not allowed in the recursive part of a CTE so we will have to get it later.
        --LEFT OUTER JOIN @hours hFullTime on hc.ENDDTE is null and hFullTime.id = hc.id + 1 and hFullTime.STARTDTE > hc.STARTDTE
)
select hS.startdte,isnull(hE.ENDDTE,hFullTime.STARTDTE-1) enddte,hs.hrs1,hs.hrs2 --,hs.*,hE.*,isnull(hE.ENDDTE,hFullTime.STARTDTE-1)
from
    ( select rootid, max(recordsinchain) recordsInChain from hours  group by rootid ) hRange 
    inner join hours hS on hRange.RootID = hS.ID
    inner join hours hE on hRange.rootID = hE.RootID and hE.recordsInChain=hRange.recordsInChain
    -- Now we will get the end date for full time records, since we couldn't do it in the CTE.
    LEFT OUTER JOIN @hours hFullTime on hE.ENDDTE is null and hFullTime.id = hE.id + 1 and hFullTime.STARTDTE > hE.STARTDTE

Related Solutions

Sql-server – SQL Query with multiple conditions on date

You can use isnull to take care of the parameters when they are null.

Something like this should do what you want.

select T.*
from #TempTbl as T
where T.StartDtm <= isnull(@StopDtm, '99991231') and 
      (T.StopDtm >= isnull(@StartDtm, '17530101') or T.StopDtm is null)

Sql-server – the best way to get all data for a date range, plus the last event just before the range

I am going to assume that there isn't an index on the date columns, otherwise I think that the query would have been structured differently. If there is, you can probably find a better performing one than this.

The advantage of this query is that it can get all the data in one scan. The disadvantage is that it has to sort the data and join EventEmployee on the entire table. So as always, test with your own situation. This query also assumes that the MAX date is either unique or that equivalent rows would be acceptable.

USE AdventureWorks2012
GO
;
WITH Base AS (
   SELECT 
      TransactionHistory.*
      ,ProductVendor.BusinessEntityID
      ,MAX(CASE WHEN TransactionDate < '2008-08-01' THEN TransactionDate END) 
           OVER (PARTITION BY ProductVendor.BusinessEntityID) AS PreviousVendorTransaction
      ,COUNT(CASE WHEN TransactionDate >= '2008-08-01' THEN 1 END ) 
           OVER (PARTITION BY ProductVendor.BusinessEntityID) AS VendorAfterCutoff
   FROM
      Production.TransactionHistory
      -- Doesn't make the most sense, but I need a repeating relation
      INNER JOIN Purchasing.ProductVendor
         ON TransactionHistory.ProductID = ProductVendor.ProductID
),
Filtered AS (
   SELECT
      *
   FROM
      Base
   WHERE
      Base.TransactionDate >= '2008-08-01'
      OR (TransactionDate = PreviousVendorTransaction AND VendorAfterCutoff > 0)
)
SELECT DISTINCT
   TransactionID
   ,ProductID
   ,ReferenceOrderID
   ,ReferenceOrderLineID
   ,TransactionDate
   ,TransactionType
   ,Quantity
   ,ActualCost
   ,ModifiedDate
FROM
   Filtered

Edit:

Hmm, I think I may have to take back my comment on structuring it differently if there are indexes. The other suggestions that I have are probably fairly minor.

Make sure the query is using the indexes you're expecting it to. Start and End date to build temp table, end date to drive the previous event loop.
If the query to build the temp table is doing a lookup on the clustered index, it may be better to hold off and do that as part of the main query.
Try using a cte instead of a temp table. I think that a cte might be more competitive with the way that the query is structured below.
If you are returning a lot of events, it might be better to pull out the event table lookup to the main query to give the optimizer the option of doing a merge join.
I don't see a way of optimizing the previous event lookup short of an indexed view.

Here's a query that combines a few of those ideas.

SELECT
    e.[EventID]
INTO #EventTemp
FROM
    [Events] AS e
WHERE
    ( e.[EventStart] >= @StartDate AND e.[EventStart] <= @EndDate )
    OR ( e.[EventEnd] >= @StartDate AND e.[EventEnd] <= @EndDate )

;
WITH PrevEvent AS (
    SELECT
        EmpPrevEvent.[EventID]
    FROM
    (
        SELECT DISTINCT
            ee.[EmployeeID]
        FROM
            #EventTemp
            INNER JOIN [EventEmployee] AS ee ON
                #EventTemp.[EventID] = ee.[EventID]
    ) AS Emp
    CROSS APPLY (
        SELECT TOP 1
            e.[EventID]
        FROM
            [Events] AS e
            INNER JOIN [EventEmployee] AS ee ON
                e.[EventID] = ee.[EventID]
        WHERE
            ee.[EmployeeID] = Emp.[EmployeeID]
            AND e.[EventEnd] < @StartDate
        ORDER BY 
            e.[EventEnd] DESC
    ) AS EmpPrevEvent
)
SELECT
    e.[EventID],
    e.[EventStart],
    e.[EventEnd],
    e.[EventTypeID]
FROM
    [Events] AS e
WHERE
    e.EventID IN (
        SELECT EventID
        FROM #EventTemp
        UNION
        SELECT EventID
        FROM PrevEvent
    )

Best Answer

Related Solutions

Sql-server – SQL Query with multiple conditions on date

Sql-server – the best way to get all data for a date range, plus the last event just before the range

Related Question