Sql-server – MSSQL – convert list of datetimes in a date range

gaps-and-islandssql serversql-server-2012

I have one system which stores data like:

CREATE TABLE Reservations
    ([UserId] int, [RoomId] int, [TypeId] int, [Date] datetime)
;

Every row in the table means that Room is reserved for User of some Type in given Date

I need to convert data to this system:

CREATE TABLE Reservations
    ([UserId] int, [RoomId] int, [TypeId] int, [StartDate] datetime, [EndDate] datetime)
;

Or to be more precise, I need to just make select statement for selecting data from old system in a new way. StartDate and EndDate must represent continuous date range.

I prepared this sql fiddle. I tried to write select statement but with no success. What am I missing?

;WITH GRP AS
(
    SELECT 
    *               
    ,ROW_NUMBER() OVER(ORDER BY [Date] ASC)
        -ROW_NUMBER() OVER(PARTITION BY [UserId], [RoomId], [TypeId] ORDER BY [Date] ASC) AS Grp
FROM
    Reservations
)
, MinMax AS
(
    SELECT
        [UserId], [RoomId], [TypeId]
        ,MIN([Date]) AS StartDate
        ,MAX([Date]) AS EndDate
    FROM
        GRP
    GROUP BY
        [UserId], [RoomId], [TypeId]
        ,Grp
)

SELECT
    [UserId], [RoomId], [TypeId]
    ,StartDate  
    ,LEAD(StartDate,1,StartDate) OVER(ORDER BY StartDate) AS EndDate
FROM
    MinMax
ORDER BY
    StartDate
    ,EndDate

Best Answer

Your problem is known as an "Islands and Gaps" problem. As far as I can see your solutions is correct (given that I understood what you try to achieve) up until:

,LEAD(StartDate,1,StartDate) OVER(ORDER BY StartDate) AS EndDate

Why are you replacing the EndDate you calculated in MinMax? Try:

SELECT
    [UserId], [RoomId], [TypeId]
    ,StartDate  
    ,EndDate
FROM
    MinMax
ORDER BY
    StartDate
   ,EndDate

Regarding how you calculate the group, I assume that a reservation is for the same room, and type during the whole period. You may want to consider something like:

,ROW_NUMBER() OVER (PARTITION BY [RoomId], [TypeId] 
                    ORDER BY [Date] ASC)
-ROW_NUMBER() OVER (PARTITION BY [UserId], [RoomId], [TypeId] 
                    ORDER BY [Date] ASC) AS Grp

In your query, GRP changes regardless of which room or type that interferes with the enumeration of dates.

PS. Welcome to the forum. A Very good first question. If GRP puzzles you, I suggest that you look at the row_number functions individually, before looking at their difference DS.

EDIT:

The idea you used uses the fact that some other user interrupts the current ordering, and thereby creates a new grp. Since several users can rent the room at the same time, I don't think that will work. Here is one idea (date is a reserved word so I changed it to dt to be able to get rid of the annoying quotes []):

with start_period as (
    select a.userid, a.roomid, a.typeid, a.dt as start_date 
    from Reservations a
    where not exists (
        select 1 from Reservations b
        where a.userid = b.userid
          and a.roomid = b.roomid
          and a.typeid = b.typeid
          and a.dt = dateadd(day, 1, b.dt)
    )
), end_period as (
    select a.userid, a.roomid, a.typeid, a.dt as end_date 
    from Reservations a
    where not exists (
        select 1 from Reservations b
        where a.userid = b.userid
          and a.roomid = b.roomid
          and a.typeid = b.typeid
          and a.dt = dateadd(day, -1, b.dt)
    )
)  
select x.userid, x.roomid, x.typeid, x.start_date
     , min(y.end_date) as end_date
from start_period x
join end_period y
    on x.userid = y.userid
    and x.roomid = y.roomid
    and x.typeid = y.typeid
    and x.start_date <= y.end_date
group by x.userid, x.roomid, x.typeid, x.start_date;

1   1   2   2020-04-01T00:00:00Z    2020-04-03T00:00:00Z
1   1   2   2020-04-05T00:00:00Z    2020-04-07T00:00:00Z
1   1   2   2020-04-10T00:00:00Z    2020-04-10T00:00:00Z
2   1   2   2020-04-01T00:00:00Z    2020-04-03T00:00:00Z
2   1   2   2020-04-05T00:00:00Z    2020-04-05T00:00:00Z
2   1   3   2020-04-06T00:00:00Z    2020-04-07T00:00:00Z

Fiddle

The idea is pretty simple, get all dates (for a given user, room, type) where there is no predecessor. Then get all dates (for a given user, room, type) where there is no successor. For all start_dates, find the smallest end_date that is bigger than the start_date