Sql-server – Merging continous date range using T-SQL

coalescesql serversql-server-2008t-sql

I have a table containing a part time working pattern consisting of start and end dates together with hours values.

Where there is an gap between an end date and a subsequent start date, this indicates the member was full time.

Where the most recent entry has a NULL end date, this indicates the part time work is still ongoing. A NULL enddte should be a day before the subsequent start date, if one is present.

I'd like to merge the date ranges where there are no gaps (ie, subsequent row start date is 1 day ahead of previous row start date) and where the hrs1 and hrs2 values match. Other solutions for similar problems don't correctly handle the NULL in the final row.

CREATE TABLE #hours(
STARTDTE DATETIME,
ENDDTE DATETIME,
hrs1 DECIMAL(6,2),
hrs2 DECIMAL(6,2))

INSERT INTO #hours VALUES
('25 MAY 1990','30 NOV 1994',18,36),
('01 DEC 1994','31 DEC 1994',18,36),
('01 JAN 2000', NULL,20,36),
('16 JUN 2002','20 APR 2007',18,36),
('10 OCT 2008', NULL,20,36)

The expected output would be as follows:

25 MAY 1990 | 31 DEC 1994 | 18 | 36
01 JAN 2000 | 15 JUN 2002 | 20 | 36
16 JUN 2002 | 20 APR 2007 | 18 | 36
10 OCT 2008 | NULL        | 20 | 36

This query needs to be compatible with SQL Server 2008 so unfortunately lead/lag aren't an option to fill in any null end dates.

Best Answer

Assuming hours are added to table in chronological order. If not, sort them before inserting them!

First we need to relate each record to successive records.
You say that is done by matching their hrs1 & hrs2, and making sure the initial record ends one day before the next starts. Your sample data only includes two records that are sequential, so I added a third (startdte='1995-01-01') to show this works in such a case. If we only expected two records to be part of the same sequence, we could just use a join. But if an indefinite number may be part of the same sequence, then we must use recursion. In this case, a CTE. The first part of the UNION gets the parent records, and the second part of the UNION gets all children. To uniquely track which sequence is which, we use the 'chain' field as a sort of bread crumb trail. Finally, we query the root ID (which we tracked as we went along) and the 'chain' field to find the first and last record in each sequence. Then we join it all back to the original data. Last, we clean up null 'ENDDTE's for full time records.

declare @hours table (
ID int IDENTITY(1,1), -- we need a unique identifier to easily reference each row.
RootID int,
STARTDTE DATETIME,
ENDDTE DATETIME,
hrs1 DECIMAL(6,2),
hrs2 DECIMAL(6,2))

INSERT @hours (STARTDTE,ENDDTE,hrs1,hrs2) VALUES
('25 MAY 1990','30 NOV 1994',18,36),
('01 DEC 1994','31 DEC 1994',18,36),
('1 JAN 1995','1 FEB 1995',18,36),
('01 JAN 2000', NULL,20,36),
('16 JUN 2002','20 APR 2007',18,36),
('10 OCT 2008', NULL,20,36)

;with
hours (chain,recordsInChain,ID, ParentID, RootID,STARTDTE,ENDDTE,hrs1,hrs2) AS (
    select 
        convert(varchar(1000),convert(varchar(10),hc.id)+','), -- This gives us a snapshot of the entire series of records to which this record is related.
        1, hc.ID, hc.ID ,hc.ID, hc.STARTDTE, 
        hc.ENDDTE, 
        hc.hrs1, hc.hrs2
    from 
        @hours hc 
        LEFT OUTER JOIN @hours hp on hc.hrs1=hp.hrs1 and hc.hrs2=hp.hrs2 and hc.STARTDTE = dateadd(dd,1,hp.ENDDTE) -- Record starts on the day after previous record ends.          
    where hp.id is null -- We want root records.  Not records with parents.
    UNION ALL
    select convert(varchar(1000),chain +convert(varchar(10),hc.id)+ ','),recordsInChain +1,  hc.ID, hp.ID ParentID, 
        convert(int,left(hp.chain,charINDEX(',',hp.chain)-1))
        , hp.STARTDTE, hc.ENDDTE, hc.hrs1, hc.hrs2
    from 
        @hours hc 
        INNER JOIN hours hp on (hc.hrs1=hp.hrs1 and hc.hrs2=hp.hrs2 and hc.STARTDTE = dateadd(dd,1,hp.enddte))
        -- We need the end date for fulltime records, but this next line is not allowed in the recursive part of a CTE so we will have to get it later.
        --LEFT OUTER JOIN @hours hFullTime on hc.ENDDTE is null and hFullTime.id = hc.id + 1 and hFullTime.STARTDTE > hc.STARTDTE
)
select hS.startdte,isnull(hE.ENDDTE,hFullTime.STARTDTE-1) enddte,hs.hrs1,hs.hrs2 --,hs.*,hE.*,isnull(hE.ENDDTE,hFullTime.STARTDTE-1)
from
    ( select rootid, max(recordsinchain) recordsInChain from hours  group by rootid ) hRange 
    inner join hours hS on hRange.RootID = hS.ID
    inner join hours hE on hRange.rootID = hE.RootID and hE.recordsInChain=hRange.recordsInChain
    -- Now we will get the end date for full time records, since we couldn't do it in the CTE.
    LEFT OUTER JOIN @hours hFullTime on hE.ENDDTE is null and hFullTime.id = hE.id + 1 and hFullTime.STARTDTE > hE.STARTDTE