Sql-server – Merge Only Events that Start immediately after the Previous Event without any Gap

date mathgaps-and-islandssql server

How can i merge only the dates occur subsequently.?

(i.e, The Next Event begins immediately after the previous event completed).

Here, we need to consider some other Columns also for getting the Output.

Here is the Attached Screenshot of Sample Events Gantt Chart.
enter image description here
The following table contains events (M_ID, Y_ID, T_ID, E_ID, BeginDate, EndDate respectively)

CREATE TABLE #EventTable
(
    M_ID INT,
    Y_ID INT,
    T_ID INT,
    E_ID INT,
    BeginDate DATE,
    EndDate DATE
);

The Below Insert Statements for Sample Data Setup.

INSERT INTO #EventTable
VALUES
(135709, 2, 7, 1, '01 Jan 2017', '31 Mar 2017'),
(135709, 2, 7, 1, '01 Apr 2017', '30 Jun 2017'),
(135709, 2, 7, 3, '01 Jan 2017', '31 Jan 2017'),
(135709, 2, 7, 3, '01 Feb 2017', '30 Apr 2017'),
(135709, 2, 7, 3, '01 May 2017', '31 May 2017'),
(135709, 2, 7, 3, '01 Jul 2017', '31 Aug 2017'),
(135709, 2, 7, 3, '01 Oct 2017', '31 Oct 2017'),
(135709, 2, 7, 3, '01 Dec 2017', '31 Dec 2017'),
(135709, 5, 8, 1, '01 Feb 2017', '30 Apr 2017'),
(135709, 5, 8, 1, '01 Apr 2017', '31 Jul 2017'),
(134560, 5, 8, 3, '01 Apr 2017', '31 Aug 2017'),
(134560, 5, 8, 3, '01 May 2017', '31 Aug 2017'),
(134560, 5, 8, 3, '01 Oct 2017', '31 Oct 2017'),
(134560, 5, 8, 3, '01 Nov 2017', '30 Nov 2017'),
(135678, 3, 6, 2, '01 Jan 2017', '31 Mar 2017'),
(135678, 3, 6, 2, '01 Apr 2017', '30 Jun 2017'),
(135678, 3, 6, 2, '01 Oct 2017', '31 Oct 2017'),
(135678, 3, 6, 2, '01 Nov 2017', '31 Dec 2017'),
(123457, 4, 2, 3, '01 May 2017', '31 Oct 2017');

The Required Output Dates (Marked as RED) is mentioned in the above Screenshot.
enter image description here

Example:

For M_ID = 135709, Y_ID = 2, T_ID = 7 and E_ID = 1;

The 2nd Event starts on (01 Apr 2017) immediately after the 1st Event completed on (31 Mar 2017).

So, In this case, The Output should be BeginDate of 1st Event(01 Jan 2017) and EndDate of 2nd event(30 Jun 2017).

Similarly,
For M_ID = 135709, Y_ID = 2, T_ID = 7 and E_ID = 3;

The 2nd Event starts on (01 Feb 17) immediately after the 1st Event completed on (31 Jan 17).

The 3rd Event starts on (01 May 17) immediately after the 2nd Event completed on (30 Apr 17).

So, In this case, The Output should be BeginDate of 1st Event (01 Jan 17) and EndDate of 3rd Event (30 Jun 17).

And As there is GAP (i.e, 4th event doesn't start next to 3rd event) between the 4th event StartDate and 3rd Event EndDate, The Dates should remain same.

Can Someone help me out.?

Best Answer

It's a bit convoluted, but I think this gets the result you're after:

;WITH src AS 
(
SELECT et.M_ID
    , et.Y_ID
    , et.T_ID
    , et.E_ID
    , et.BeginDate
    , Marker = CASE WHEN et.BeginDate <= DATEADD(DAY, 1, LAG(et.EndDate, 1, et.BeginDate) OVER (PARTITION BY et.M_ID
            , et.Y_ID
            , et.T_ID
            , et.E_ID
             ORDER BY et.BeginDate))
        THEN 1
        ELSE 0
        END
    , et.EndDate
FROM #EventTable et
)
, src2 AS 
(
SELECT src.*
    , rn1 = ROW_NUMBER() OVER (PARTITION BY src.M_ID
            , src.Y_ID
            , src.T_ID
            , src.E_ID
            , src.Marker
             ORDER BY src.BeginDate)
    , rn2 = ROW_NUMBER() OVER (PARTITION BY src.M_ID
            , src.Y_ID
            , src.T_ID
            , src.E_ID
            , src.Marker
             ORDER BY src.BeginDate DESC)
FROM src
)
SELECT src2.M_ID
    , src2.Y_ID
    , src2.T_ID
    , src2.E_ID
    , StartDate = MIN(src2.BeginDate)
    , EndDate = MAX(src2.EndDate)
FROM src2
WHERE src2.rn1 = 1
    OR src2.rn2 = 1
GROUP BY src2.M_ID
    , src2.Y_ID
    , src2.T_ID
    , src2.E_ID
    , src2.Marker
ORDER BY src2.M_ID
    , src2.Y_ID
    , src2.T_ID
    , src2.E_ID
    , StartDate
    , EndDate;

Output:

╔════════╦══════╦══════╦══════╦════════════╦════════════╗
║  M_ID  ║ Y_ID ║ T_ID ║ E_ID ║ StartDate  ║  EndDate   ║
╠════════╬══════╬══════╬══════╬════════════╬════════════╣
║ 134560 ║    5 ║    8 ║    3 ║ 2017-04-01 ║ 2017-08-31 ║
║ 135709 ║    2 ║    7 ║    1 ║ 2017-01-01 ║ 2017-06-30 ║
║ 135709 ║    2 ║    7 ║    3 ║ 2017-01-01 ║ 2017-05-31 ║
║ 135709 ║    2 ║    7 ║    3 ║ 2017-07-01 ║ 2017-08-31 ║
║ 135709 ║    5 ║    8 ║    1 ║ 2017-02-01 ║ 2017-07-31 ║
╚════════╩══════╩══════╩══════╩════════════╩════════════╝

The code uses the LAG() aggregate to inspect the next logical row to see if the start date is prior to or adjacent-by-one-day to the end date in the current row. It then uses the two ROW_NUMBER() aggregates to get the first-and-last rows of each group.