I have two tables. Each holds some attributes for a business entity and the date range for which those attributes were valid. I want to combine these tables into one, matching rows on the common business key and splitting the time ranges.
The real-world example is two source temporal tables feeding a type-2 dimension table in the data warehouse.
The entity can be present in neither, one or both of the source systems at any point in time. Once an entity is recorded in a source system the intervals are well-behaved – no gaps, duplicates or other monkey business. Membership in the sources can end at different dates.
The business rules state we only want to return intervals where the entity is present in both sources simultaneously.
What query will give this result?
This illustrates the situation:
Month J F M A M J J
Source A: <--><----------><----------><---->
Source B: <----><----><----------------><-->
Result: <----><----><----><---->
Sample Data
For simplicity I've used closed date intervals; likely any solution could be extended to half-open intervals with a little typing.
drop table if exists dbo.SourceA;
drop table if exists dbo.SourceB;
go
create table dbo.SourceA
(
BusinessKey int,
StartDate date,
EndDate date,
Attribute char(9)
);
create table dbo.SourceB
(
BusinessKey int,
StartDate date,
EndDate date,
Attribute char(9)
);
GO
insert dbo.SourceA(BusinessKey, StartDate, EndDate, Attribute)
values
(1, '19990101', '19990113', 'black'),
(1, '19990114', '19990313', 'red'),
(1, '19990314', '19990513', 'blue'),
(1, '19990514', '19990613', 'green'),
(2, '20110714', '20110913', 'pink'),
(2, '20110914', '20111113', 'white'),
(2, '20111114', '20111213', 'gray');
insert dbo.SourceB(BusinessKey, StartDate, EndDate, Attribute)
values
(1, '19990214', '19990313', 'left'),
(1, '19990314', '19990413', 'right'),
(1, '19990414', '19990713', 'centre'),
(1, '19990714', '19990730', 'back'),
(2, '20110814', '20110913', 'top'),
(2, '20110914', '20111013', 'middle'),
(2, '20111014', '20120113', 'bottom');
Desired output
BusinessKey StartDate EndDate a_Colour b_Placement
----------- ---------- ---------- --------- -----------
1 1999-02-14 1999-03-13 red left
1 1999-03-14 1999-04-13 blue right
1 1999-04-14 1999-05-13 blue centre
1 1999-05-14 1999-06-13 green centre
2 2011-08-14 2011-09-13 pink top
2 2011-09-14 2011-10-13 white middle
2 2011-10-14 2011-11-13 white bottom
2 2011-11-14 2011-12-13 gray bottom
Best Answer
I may have misunderstood your question, but the results seem to be according to your question:
Since intervals need to overlap most of the work can be done with a join with that as the predicate. Then it's just a matter of choosing the intersection of the intervals.
LEAST and GREATEST seem to be missing as functions, so I used a case expression instead.
Fiddle