Sql-server – Merge date ranges on updates

gaps-and-islandssql servert-sql

Given a table of date ranges, e.g. Start Date, End Date, I've seen a few examples online of how to merge ranges on the fly, but I'm able to merge them as the user makes updates.

So for example, if 01/01 to 01/02 exists and a user inserts 15/01 to 15/02, then the original range can be updated to 01/01 to 15/02.

Is there a well known way for dealing with this, or do I need to work it all out and put it in a stored procedure?

The table looks like this…

CREATE TABLE [dbo].[dateRange](
   [Id] [int] NOT NULL,
   [Start] [datetime] NOT NULL,
   [Finish] [datetime] NOT NULL
)

Given two ranges…

INSERT INTO dateRange (Id, Start, Finish) VALUES (1, '01-01-17', '01-02-17');
INSERT INTO dateRange (Id, Start, Finish) VALUES (2, '03-01-17', '01-04-17');

The user can delete, update and insert new ranges. Deletes are simple. I think there are three cases for inserts (updates have similar possibilities)…

  1. New range covers an existing range…

    INSERT INTO dateRange (Id, Start, Finish) VALUES (3, '15-12-16', '15-02-17');
    

    Results in deletion of 1.

  2. New range is covered by existing range. So…

    INSERT INTO dateRange (Id, Start, Finish) VALUES (4, '15-01-17', '16-01-17');
    

    .. is ignored.

  3. New range partially overlaps a new range.

    INSERT INTO dateRange (Id, Start, Finish) VALUES (5, '15-12-16', '15-01-17');
    

    Insert, again, is ignored and the overlapped row is extended. It's more complicated if both ends of the new range are overlapped because one of the overlapped ranges needs to be deleted.

The window functions sound cool but I'm thinking of doing it in C#.

For every user, I want zero overlapping date ranges. I'm using SQL Server 2014.

Best Answer

In one of my comments to you

I was thinking about solving your problem with a DML trigger. That would allow the work to be done in the database engine without having to return all of the data to C# for evaluation - are you open to a solution that blows away the current data and lays out a new set of rows with the proper start/finish dates based on the recent DML action? It would not be as elegant as updating existing rows.

You said (paraphrasing)

I think it's worth a look but I tend to avoid blowing data in that manner... could you give at least a brief account of your solution?

Well, it's a slow work day today and since no one else has come on stage with an answer yet, I'll step up to the mic and prepare for the hecklers. This is a prototype without any performance considerations (indexes, etc). Even with proper indexing, it may not suit your needs. Naturally, if I have misunderstood your question (or worse, have a fatal flaw), go easy on the downvotes ;)

I tried to come at this problem with the KISS principle. My approach relies on a table trigger which deletes existing data and lays out a new timeline when inserts, updates and deletes are executed. It relies heavily on logic I got from Itzik Ben-Gan's post entitled New Solution to the Packing Intervals Problem. You should definitely read Itzik's post to fully understand the logic in my trigger.

Your original question indicated that "For every user, I want zero overlapping date ranges.". I took the liberty of adding a UserID column to your table. My table also uses an IDENTITY for the Id column. Since my solution blows away existing data and inserts a new timeline, I don't specify the Id column on my inserts.

IF OBJECT_ID('dbo.DateRange') IS NOT NULL 
    DROP TABLE dbo.DateRange

CREATE TABLE [dbo].[DateRange](
   [Id] [int] IDENTITY(1,1) NOT NULL,
   [UserId] VARCHAR(10) NOT NULL,
   [Start] [datetime] NOT NULL,
   [Finish] [datetime] NOT NULL
)
go

Here's the trigger where the real fun happens

CREATE TRIGGER dbo.DateRange_Merge_Ranges 
   ON  dbo.DateRange 
   AFTER INSERT,DELETE,UPDATE
AS 
BEGIN
    SET NOCOUNT ON;

    --Dropping temp tables used in the trigger
    IF OBJECT_ID('tempdb..DataToInsert') IS NOT NULL 
        DROP TABLE #DataToInsert;
    IF OBJECT_ID('tempdb..UniqueUserIdsAffected') IS NOT NULL 
        DROP TABLE #UniqueUserIdsAffected;

    --Get all UserId's affected by the Insert, Update, Delete
    SELECT * INTO #UniqueUserIdsAffected FROM
    (
    select UserId from inserted
    union
    select UserId from deleted
    ) a

    --Calculate a new column called prvend which contains the 'previous' rows Finish date
;   WITH C1 AS
    (
        SELECT a.*,
            MAX(Finish) OVER(PARTITION BY a.UserId
                            ORDER BY a.start, a.finish
                            ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS prvend
        FROM dbo.DateRange a join
        #UniqueUserIdsAffected b on 
            b.UserId = a.UserId
    ),

    --Determine whether each row starts a new 'packed-interval' - call the column grp
    C2 AS
    (
    SELECT *,
        SUM(isstart) OVER(PARTITION BY C1.UserId
                        ORDER BY C1.Start, C1.Finish
                        ROWS UNBOUNDED PRECEDING) AS grp
    FROM C1
    CROSS APPLY ( VALUES(CASE WHEN Start <= prvend THEN 0 ELSE 1 END) ) AS A(isstart)
    )

    --Insert the results into a local temp table because we're going to delete
    --all rows fromthe real table matching the UserId's affected in the Insert, Update, Delete
    SELECT C2.UserId, MIN(C2.Start) AS Start, MAX(C2.Finish) AS Finish into #DataToInsert
    FROM C2 
    group by C2.UserId, C2.grp;

    --Delete all rows fromthe real table matching the UserId's affected in the Insert, Update, Delete
    DELETE a from
        dbo.DateRange a join
        #UniqueUserIdsAffected b on b.UserId = a.UserId

    --Insert into the real table the newly calculated time slices
    INSERT INTO dbo.DateRange(UserId, Start, Finish)
        SELECT UserId, Start, Finish from #DataToInsert

END
GO

Now that we have our table with the trigger defined, let's insert some rows I'm inserting rows for TOM and JOE. My DML examples deal with TOM, but I wanted you to see that there was no magic up my sleeve in keeping updates isolated to whom they belong.

INSERT INTO DateRange (UserId, Start, Finish) VALUES ('TOM', '2017-01-01', '2017-02-01');
INSERT INTO DateRange (UserId, Start, Finish) VALUES ('TOM', '2017-03-01', '2017-04-01');
INSERT INTO DateRange (UserId, Start, Finish) VALUES ('JOE', '2017-01-01', '2017-02-01');
INSERT INTO DateRange (UserId, Start, Finish) VALUES ('JOE', '2017-03-01', '2017-04-01');
select * from DateRange order by UserId, Start, Finish

| UserId | Start                   | Finish                  |
|--------|-------------------------|-------------------------|
| JOE    | 2017-01-01 00:00:00.000 | 2017-02-01 00:00:00.000 |
| JOE    | 2017-03-01 00:00:00.000 | 2017-04-01 00:00:00.000 |
| TOM    | 2017-01-01 00:00:00.000 | 2017-02-01 00:00:00.000 |
| TOM    | 2017-03-01 00:00:00.000 | 2017-04-01 00:00:00.000 |

Each of the following cases build on the previous case (if one exists)


Case 1: New range covers an existing range...

INSERT INTO DateRange (UserId, Start, Finish) VALUES ('TOM', '2016-12-15', '2017-02-15');

| UserId | Start                   | Finish                  |
|--------|-------------------------|-------------------------|
| JOE    | 2017-01-01 00:00:00.000 | 2017-02-01 00:00:00.000 |
| JOE    | 2017-03-01 00:00:00.000 | 2017-04-01 00:00:00.000 |
| TOM    | 2016-12-15 00:00:00.000 | 2017-02-15 00:00:00.000 |
| TOM    | 2017-03-01 00:00:00.000 | 2017-04-01 00:00:00.000 |

Case 2: New range is covered by existing range. So...

INSERT INTO DateRange (UserId, Start, Finish) VALUES ('TOM', '2017-01-15', '2017-01-16');

| UserId | Start                   | Finish                  |
|--------|-------------------------|-------------------------|
| JOE    | 2017-01-01 00:00:00.000 | 2017-02-01 00:00:00.000 |
| JOE    | 2017-03-01 00:00:00.000 | 2017-04-01 00:00:00.000 |
| TOM    | 2016-12-15 00:00:00.000 | 2017-02-15 00:00:00.000 |
| TOM    | 2017-03-01 00:00:00.000 | 2017-04-01 00:00:00.000 |

Case 3: New range partially overlaps a new range.

INSERT INTO DateRange (UserId, Start, Finish) VALUES ('TOM', '2016-12-15', '2017-01-15');

| UserId | Start                   | Finish                  |
|--------|-------------------------|-------------------------|
| JOE    | 2017-01-01 00:00:00.000 | 2017-02-01 00:00:00.000 |
| JOE    | 2017-03-01 00:00:00.000 | 2017-04-01 00:00:00.000 |
| TOM    | 2016-12-15 00:00:00.000 | 2017-02-15 00:00:00.000 |
| TOM    | 2017-03-01 00:00:00.000 | 2017-04-01 00:00:00.000 |

Case 4: Update an existing row

UPDATE DateRange SET Finish = '2017-03-02' WHERE UserId = 'TOM' and Start = '2016-12-15';

| UserId | Start                   | Finish                  |
|--------|-------------------------|-------------------------|
| JOE    | 2017-01-01 00:00:00.000 | 2017-02-01 00:00:00.000 |
| JOE    | 2017-03-01 00:00:00.000 | 2017-04-01 00:00:00.000 |
| TOM    | 2016-12-15 00:00:00.000 | 2017-04-01 00:00:00.000 |

Case 5: Insert a new range

INSERT INTO DateRange (UserId, Start, Finish) VALUES ('TOM', '2017-06-01', '2017-07-01');

| UserId | Start                   | Finish                  |
|--------|-------------------------|-------------------------|
| JOE    | 2017-01-01 00:00:00.000 | 2017-02-01 00:00:00.000 |
| JOE    | 2017-03-01 00:00:00.000 | 2017-04-01 00:00:00.000 |
| TOM    | 2016-12-15 00:00:00.000 | 2017-04-01 00:00:00.000 |
| TOM    | 2017-06-01 00:00:00.000 | 2017-07-01 00:00:00.000 |

Case 6: New range partially overlaps a new range.

INSERT INTO DateRange (UserId, Start, Finish) VALUES ('TOM', '2017-04-01', '2017-05-01');

| UserId | Start                   | Finish                  |
|--------|-------------------------|-------------------------|
| JOE    | 2017-01-01 00:00:00.000 | 2017-02-01 00:00:00.000 |
| JOE    | 2017-03-01 00:00:00.000 | 2017-04-01 00:00:00.000 |
| TOM    | 2016-12-15 00:00:00.000 | 2017-05-01 00:00:00.000 |
| TOM    | 2017-06-01 00:00:00.000 | 2017-07-01 00:00:00.000 |

Case 7: New range partially overlaps a new range.

INSERT INTO DateRange (UserId, Start, Finish) VALUES ('TOM', '2017-05-05', '2017-06-05');

| UserId | Start                   | Finish                  |
|--------|-------------------------|-------------------------|
| JOE    | 2017-01-01 00:00:00.000 | 2017-02-01 00:00:00.000 |
| JOE    | 2017-03-01 00:00:00.000 | 2017-04-01 00:00:00.000 |
| TOM    | 2016-12-15 00:00:00.000 | 2017-05-01 00:00:00.000 |
| TOM    | 2017-05-05 00:00:00.000 | 2017-07-01 00:00:00.000 |

Case 8: Delete an existing row

delete from DateRange WHERE UserId = 'TOM' and Start = '2016-12-15';

| UserId | Start                   | Finish                  |
|--------|-------------------------|-------------------------|
| JOE    | 2017-01-01 00:00:00.000 | 2017-02-01 00:00:00.000 |
| JOE    | 2017-03-01 00:00:00.000 | 2017-04-01 00:00:00.000 |
| TOM    | 2017-05-05 00:00:00.000 | 2017-07-01 00:00:00.000 |

If you can achieve the same results in C# in an easier way, you should stick with that. Of course, a 'smartie' on this site may have a better solution.