My question is similar to this one with (I think) significant enough difference. I have a base range and I want to find all other ranges in a table that conflict with it and each other. or rather the items that form the ranges, but it doesn't really make a difference.
The line with a asterisk is the starting range. The ranges 1,2 and 3 are the ones that are supposed to expand it. The result range should be the X.
1 |
3 | ===1==== ====
5 | ==2== ====*==== ====
6 | ====3==== =====
--+-------------------------------------
| |<--------X-------->|
I have written this:
WITH cte
AS (
SELECT DATA1.ID, DATA1.STARTDATE, DATA1.ENDDATE
FROM DATA1
WHERE
DATA1.ID = @PARID AND
DATA1.STARTDATE > @STARTDATE AND
DATA1.ENDDATE < @ENDDATE
UNION ALL
SELECT DATA1.ID, DATA1.STARTDATE, DATA1.ENDDATE
FROM cte
INNER JOIN DATA1 ON (DATA1.ID = cte.ID)
WHERE
DATA1.ID = @PARID AND
(cte.STARTDATE < DATA1.ENDDATE AND cte.ENDDATE > DATA1.ENDDATE)
)
SELECT *
FROM cte
But after some fiddling realized this can't really work, since the 2nd SELECT doesn't know what STARTDATE
and ENDDATE
it should use from the cte
. And I can't use subqueries in recursive SQL.
We have previously calculated this in client in .NET via set of recursive functions, but at O(N^2) it was stupidly slow. The intent here is to move the calculations to the server as well as optimize the calculation.
Is what I'm trying to do here at all viable?
SQL Fiddle. I'm having some trouble getting the query to run as it did on my machine (where I used less simple data structure), but at least I added example data. I will keep trying to get it to produce the same result I got on my machine.
With input range from 2018-08-24 12:00:00
to 2018-08-31 12:00:00
, the correct output should be IDs 2, 3, 4, 6
.
Best Answer
A straightforward recursive solution splits the task in two parts:
Details follow:
1. Follow the chain of overlaps leading to the earliest start date
Example code:
2. Follow the chain of overlaps leading to the latest end date
Example code:
With suitable indexing, both queries use an execution plan like:
The results given the sample data provided are:
So IDs 2, 3, 4, and 6 were used. The final range is
2018-08-22 12:00:00.000
(lowest start date found) to2018-09-02 12:00:00.000
(highest end date found).Note: The code above includes a tie-break on dates (using the
ID
column) due to duplicates in the sample data. This may, or may not, be required to solve the real problem.The indexes used were:
Demo: db<>fiddle
Note that truly optimal indexing for these types of queries can be difficult. Even so, the approach above often performs well for many practical tasks in this area.
Until proper support for interval queries and predicates is added to SQL Server, higher performance solutions require significant extra work. See: