SQL Server Split Intervals – How to Split Intervals in SQL Server

azure-synapse-analyticsintervalsql server

I have two tables. Each holds some attributes for a business entity and the date range for which those attributes were valid. I want to combine these tables into one, matching rows on the common business key and splitting the time ranges.

The real-world example is two source temporal tables feeding a type-2 dimension table in the data warehouse.

The entity can be present in neither, one or both of the source systems at any point in time. Once an entity is recorded in a source system the intervals are well-behaved – no gaps, duplicates or other monkey business. Membership in the sources can end at different dates.

The business rules state we only want to return intervals where the entity is present in both sources simultaneously.

What query will give this result?

This illustrates the situation:

Month          J     F     M     A     M     J     J
Source A:  <--><----------><----------><---->
Source B:            <----><----><----------------><-->
               
Result:              <----><----><----><---->

Sample Data

For simplicity I've used closed date intervals; likely any solution could be extended to half-open intervals with a little typing.

drop table if exists dbo.SourceA;
drop table if exists dbo.SourceB;
go

create table dbo.SourceA
(
    BusinessKey int,
    StartDate   date,
    EndDate     date,
    Attribute   char(9)
);

create table dbo.SourceB
(
    BusinessKey int,
    StartDate   date,
    EndDate     date,
    Attribute   char(9)
);
GO


insert dbo.SourceA(BusinessKey, StartDate, EndDate, Attribute)
values
    (1, '19990101', '19990113', 'black'),
    (1, '19990114', '19990313', 'red'),
    (1, '19990314', '19990513', 'blue'),
    (1, '19990514', '19990613', 'green'),
    (2, '20110714', '20110913', 'pink'),
    (2, '20110914', '20111113', 'white'),
    (2, '20111114', '20111213', 'gray');

insert dbo.SourceB(BusinessKey, StartDate, EndDate, Attribute)
values
    (1, '19990214', '19990313', 'left'),
    (1, '19990314', '19990413', 'right'),
    (1, '19990414', '19990713', 'centre'),
    (1, '19990714', '19990730', 'back'),
    (2, '20110814', '20110913', 'top'),
    (2, '20110914', '20111013', 'middle'),
    (2, '20111014', '20120113', 'bottom');

Desired output

BusinessKey StartDate   EndDate     a_Colour  b_Placement
----------- ----------  ----------  --------- -----------
1           1999-02-14  1999-03-13  red       left     
1           1999-03-14  1999-04-13  blue      right    
1           1999-04-14  1999-05-13  blue      centre   
1           1999-05-14  1999-06-13  green     centre   
2           2011-08-14  2011-09-13  pink      top      
2           2011-09-14  2011-10-13  white     middle   
2           2011-10-14  2011-11-13  white     bottom   
2           2011-11-14  2011-12-13  gray      bottom    

Best Answer

I may have misunderstood your question, but the results seem to be according to your question:

select a.businesskey
     -- greatest(a.startdate, b.startdate)
     , case when a.startdate > b.startdate 
            then a.startdate 
            else b.startdate 
       end as startdate
     -- least(a.enddate, b.enddate)
     , case when a.enddate < b.enddate 
            then a.enddate 
            else b.enddate 
       end as enddate
     , a.attribute as a_color
     , b.attribute as b_placement
from dbo.SourceA a 
join dbo.SourceB b 
        on a.businesskey = b.businesskey
       and (a.startdate between b.startdate and b.enddate 
          or b.startdate between a.startdate and a.enddate)
order by 1,2

Since intervals need to overlap most of the work can be done with a join with that as the predicate. Then it's just a matter of choosing the intersection of the intervals.

LEAST and GREATEST seem to be missing as functions, so I used a case expression instead.

Fiddle