Sql-server – Updating table based on conditions (overlapping dates)

sql servert-sql

Say I have a table name CustomerBooking which contains Customers, checkInDate and checkOutDate as follows. I would like to achieve the outcome as shown in condition 1 and condition 2 with update statement.

I have tried some solutions online but to no avail that it does not include NULL as infinite date in overlapping (for condition 1) and/or only takes the max date of overlapping date into account instead of taking max date from last row of overlapping dates. (for condition 2).

Do take in mind that condition 1 and condition 2 are two independent separate outcome that should be achieved with update statement based on the first table.

ID  CustomerID CheckInDate CheckOutDate 
1   1          2015-04-02  2015-04-05
2   2          2015-03-04  2015-05-02
3   1          2015-04-01  NULL(which represents infinite date)
4   1          2015-03-04  2015-05-03
5   1          2015-01-03  2015-02-03
  • Condition 1: Merge the overlapping date of same member ID (take earliest start date and max end date of overlapping dates) and update the table.

Note: Row with ID 1, ID 3 and ID 4 has overlapped dates. Therefore we combine these 3 rows to CheckInDate(2015-03-04) and CheckOutDate(NULL) by taking the min date and max date of overlapping dates. Only row with ID 1 is updated with the new dates and remaining ID 3 and ID 4 date is set to null. Rows with ID 2 and ID 5 do not have overlapping dates of same member ID therefore it remains as it is.

Final result is as shown below:

ID CustomerID CheckInDate CheckOutDate 
1   1         2015-03-04  NULL 
2   2         2015-03-04  2015-05-02
3   1         NULL        NULL 
4   1         NULL        NULL 
5   1         2015-01-03  2015-02-03
  • Condition 2: Merge the overlapping date of same member ID (take earliest start date and last end date(largest ID) of overlapping dates) and update the table.

Similar as condition 1, but this time we take the min date of overlapping dates (2015-03-04) and the check out date of the last row (row with ID 4) of overlappings date (which if 2015-05-03).

Final result is as shown below:

ID CustomerID CheckInDate CheckOutDate 
1   1         2015-03-04   2015-05-03
2   2         2015-03-04   2015-05-02
3   1         NULL              NULL
4   1         NULL               NULL
5   1         2015-01-03  2015-02-03

Best Answer

Condition 1:

WITH ord as (
    SELECT ID, CustomerID, CheckInDate, CheckOutDate
        , n = ROW_NUMBER() over(partition by [CustomerID] order by [CheckInDate], [CheckOutDate])
    FROM @data d1
), first as (
    SELECT o1.ID, o1.CustomerID, o1.CheckInDate, o1.CheckOutDate, o1.n
        , m = ROW_NUMBER() over(partition by o1.[CustomerID] order by o1.[CheckInDate], o1.[CheckOutDate])
    FROM ord o1
    INNER JOIN ord o2 ON o1.CustomerID = o2.CustomerID AND o2.n+1 = o1.n AND o1.CheckInDate > o2.CheckOutDate
), groups as (
    SELECT o.ID, o.CustomerID, nx = MIN(coalesce(f.n, 1)), n = MAX(o.n)
        , p = ROW_NUMBER() over(partition by o.CustomerID, MIN(coalesce(f.n, 1)) ORDER BY o.ID)
    FROM ord o
    LEFT JOIN first f ON o.CustomerID = f.CustomerID AND o.n < f.n
    GROUP BY o.ID, o.CustomerID
), dates as (
    SELECT g.CustomerID, g.nx, CheckInDate = MIN(o.CheckInDate)
        , CheckOutDate = CASE WHEN SUM(CASE WHEN o.CheckOutDate IS NULL THEN 1 END) IS NULL THEN MAX(o.CheckOutDate) END
    FROM groups g
    INNER JOIN ord o ON g.ID = o.ID
    GROUP BY g.nx, g.CustomerID
    HAVING COUNT(g.nx) > 1
)
SELECT o.ID, o.CustomerID
    , CheckInDate = CASE WHEN g.p = 1 THEN d.CheckInDate END
    , CheckOutDate = CASE WHEN g.p = 1 THEN d.CheckOutDate END
FROM groups g
INNER JOIN ord o ON g.ID = o.ID
INNER JOIN dates d on g.CustomerID = d.CustomerID AND g.nx = d.nx
ORDER BY ID

This query output the rows that must be updated:

  1. ord = I first partition by CustomerID and order by CheckInDate
  2. first = I join consecutive rows from previous CTE where the next CheckInDate does not overlap previous CheckInDate and CheckDate and I partition and order them
  3. groups = I group by the previous partitioned number in order to know to which group a row belongs
  4. dates = I join with the original data in order to get the first and last date for each groups. Groups with only 1 row are removed
  5. main select output the dates for p=1 or NULL otherwise

Output:

ID  CustomerID  CheckInDate CheckOutDate
1   1           2015-03-04  NULL
3   1           NULL        NULL
4   1           NULL        NULL

Condition 2:

WITH ord as (
    SELECT ID, CustomerID, CheckInDate, CheckOutDate
        , n = ROW_NUMBER() over(partition by [CustomerID] order by [CheckInDate], [CheckOutDate])
    FROM @data d1
), first as (
    SELECT o1.ID, o1.CustomerID, o1.CheckInDate, o1.CheckOutDate, o1.n
        , m = ROW_NUMBER() over(partition by o1.[CustomerID] order by o1.[CheckInDate], o1.[CheckOutDate])
    FROM ord o1
    INNER JOIN ord o2 ON o1.CustomerID = o2.CustomerID AND o2.n+1 = o1.n AND o1.CheckInDate > o2.CheckOutDate
), groups as (
    SELECT o.ID, o.CustomerID, nx = MIN(coalesce(f.n, 1)), n = MAX(o.n)
        , p = ROW_NUMBER() over(partition by o.CustomerID, MIN(coalesce(f.n, 1)) ORDER BY o.ID)
        , last = ROW_NUMBER() over(partition by o.CustomerID, MIN(coalesce(f.n, 1)) ORDER BY o.ID DESC)
    FROM ord o
    LEFT JOIN first f ON o.CustomerID = f.CustomerID AND o.n < f.n
    GROUP BY o.ID, o.CustomerID
), dates as (
    SELECT g.CustomerID, g.nx, CheckInDate = MIN(o.CheckInDate)
        , CheckOutDate = MAX(o2.CheckOutDate)
    FROM groups g
    INNER JOIN ord o ON g.ID = o.ID
    INNER JOIN (SELECT ID, CustomerID, nx FROM groups WHERE last = 1) l ON g.CustomerID = l.CustomerID AND g.nx = l.nx
    INNER JOIN ord o2 ON l.ID = o2.ID
    GROUP BY g.nx, g.CustomerID
    HAVING COUNT(g.nx) > 1
)
SELECT o.ID, o.CustomerID
    , CheckInDate = CASE WHEN g.p = 1 THEN d.CheckInDate END
    , CheckOutDate = CASE WHEN g.p = 1 THEN d.CheckOutDate END
FROM groups g
INNER JOIN ord o ON g.ID = o.ID
INNER JOIN dates d on g.CustomerID = d.CustomerID AND g.nx = d.nx
ORDER BY ID

Output:

ID  CustomerID  CheckInDate CheckOutDate
1   1           2015-03-04  2015-05-03
3   1           NULL        NULL
4   1           NULL        NULL

For Updates, replace SELECT by UPDATE:

UPDATE g SET 
    CheckInDate = CASE WHEN g.p = 1 THEN d.CheckInDate END
    , CheckOutDate = CASE WHEN g.p = 1 THEN d.CheckOutDate END
FROM ...

Your data:

declare @data table([ID] int, [CustomerID] int, [CheckInDate] date, [CheckOutDate] date);
Insert into @data([ID], [CustomerID], [CheckInDate], [CheckOutDate])
VALUES
    (1, 1, '2015-04-02', '2015-04-05'),
    (2, 2, '2015-03-04', '2015-05-02'),
    (3, 1, '2015-04-01', NULL),
    (4, 1, '2015-03-04', '2015-05-03'),
    (5, 1, '2015-01-03', '2015-02-03')
;

It works as well with this sample:

(1, 1, '2015-04-02', '2015-04-05'),
(2, 2, '2015-03-04', '2015-05-02'),
(3, 1, '2015-04-01', NULL),
(4, 1, '2015-03-04', '2015-05-03'),
(5, 1, '2015-01-03', '2015-02-03'),
(6, 1, '2015-01-02', '2015-02-03'),
(7, 1, '2015-03-04', '2015-03-06'),
(8, 1, '2015-03-04', '2015-05-06'),
(9, 1, '2014-04-02', '2014-04-05'),
(10, 1, '2014-03-04', '2014-05-02')

If it does not work with some of your data, update Input and Output table with more relevant values.