Sql-server – Derive Date Spans from Start and End Dates in SQL Server table

date mathsql server

I am using SQL Server 2016

I have a table that contains 1 row per month that a patient is assigned to a particular Provider.

A patient can be assigned to multiple providers during the year.

How can I derive date spans (startdate & enddate) to represent the time a patient was assigned to each provider.

My table looks like this:

+----------+---------------+------------+-----------+
| Provider | Patient       | StartDate  | EndDate  | 
+----------+---------------+------------+-----------+
| 1922157  | 12345         | 20191201  | 20191231 | 
| 1904176  | 12345         | 20191101  | 20191201 |
| 1904176  | 12345         | 20191001  | 20191101 |
| 1904176  | 12345         | 20190901  | 20191001 | 
| 1904176  | 12345         | 20190801  | 20190901 |
| 1904176  | 12345         | 20190701  | 20190801 |
| 1904176  | 12345         | 20190601  | 20190701 |
| 1904176  | 12345         | 20190501  | 20190601 |
| 1904176  | 12345         | 20190401  | 20190501 |
| 1904176  | 12345         | 20190301  | 20190401 |
| 1904176  | 12345         | 20190201  | 20190301 |
| 1922157  | 12345         | 20190101  | 20190201 |
| 1922157  | 56789         | 20190101  | 20190201 |
+----------+---------------+------------+-----------+

In this case, patient 12345 was assigned to 2 different providers. One for 2 months, January and then December and the other for the rest of the year (10 months) February through November. Patient 56789 was only assigned to 1 provider (1922157) for 1 month (in December).

I'm trying to make it so my output looks like the below table but I am running into issues I think because the patient is assigned to the same pcp during 2 different times of the year. I tried using the lag function but I only get the correct results for some cases but not all such as this particular case.

+----------+---------------+------------+-----------+
| Provider | Patient       | StartDate  | EndDate  | 
+----------+---------------+------------+-----------+
| 1922157  | 12345         | 20190101  | 20190201  | 
| 1904176  | 12345         | 20190201  | 20191201  | 
| 1922157  | 12345         | 20191201  | 20191231  | 
| 1922157  | 56789         | 20191201  | 20191231  |
+----------+---------------+------------+-----------+

Update: Was doing some more research and came across the following post:

https://stackoverflow.com/questions/35900765/ms-sql-combine-date-rows-into-start-end-date

I just fit my table into the code in the answer for above question and tested for a few of my cases and it looks like it might get the job done. Unfortunately, my base table has 140k rows of dates it will need to calculate through so I am not sure how long it will take to run. Has been running now for 6 minutes, I will post back with results.

Best Answer

I think I understand what you're trying to do. You're trying to get the start date and end date of a patient at a provider, as long as there is no gap between the start and end dates of the periodes. I've created a test table with the data you sampled.

Create table test (Provider int, Patient int, startdate date, enddate date)
insert into test (Provider, Patient, StartDate, EndDate)

SELECT * FROM 
(SELECT 1922157 as Provider  , 12345 as Patient         , '2019-12-01' as StartDate , '2019-12-31' as EndDate
union all SELECT 1904176  , 12345         , '2019-11-01'  , '2019-12-01' 
union all SELECT 1904176  , 12345         , '2019-10-01'  , '2019-11-01' 
union all SELECT 1904176  , 12345         , '2019-09-01'  , '2019-10-01' 
union all SELECT 1904176  , 12345         , '2019-08-01'  , '2019-09-01' 
union all SELECT 1904176  , 12345         , '2019-07-01'  , '2019-08-01' 
union all SELECT 1904176  , 12345         , '2019-06-01'  , '2019-07-01' 
union all SELECT 1904176  , 12345         , '2019-05-01'  , '2019-06-01' 
union all SELECT 1904176  , 12345         , '2019-04-01'  , '2019-05-01' 
union all SELECT 1904176  , 12345         , '2019-03-01'  , '2019-04-01' 
union all SELECT 1904176  , 12345         , '2019-02-01'  , '2019-03-01' 
union all SELECT 1922157  , 12345         , '2019-01-01'  , '2019-02-01' 
union all SELECT 1922157  , 56789         , '2019-01-01'  , '2019-02-01' )t

The Idea is to start by ordering data and trying to get those that start date and end dates match, in order to detect a hole in the dates. I do that with the "ROW_NUMBER" function. I then find all the rows that match and take the first StartDate and max EndDate for those who match, and then I add all the rows that are "alone" and have no match.

I think it works wit the data you provided. I didn't get to test it with other data. Recursivity is another option to find the Min/Max dates of different values but I didn't go with recursivity in this case. (feel free to give better names, I went a little fast)

;With RowsWithNum AS
(
SELECT Provider, Patient, StartDate, EndDate, ROW_NUMBER() OVER (ORDER BY Provider, patient, StartDate) as RowNum
FROM test
)
,BeforeAndAfterDates AS
(
SELECT a.Provider, a.Patient, a.StartDate, a.RowNum, a.EndDate, b.StartDate EndStartDate, DATEPART(DAYOFYEAR, b.StartDate)-DATEPART(DAYOFYEAR,a.EndDate) as DateDiffInDays, b.EndDate as EndEndDate, b.RowNum as EndRowNum
FROM RowsWithNum a
LEFT JOIN RowsWithNum b ON b.Provider=a.Provider and b.Patient=a.Patient and b.StartDate=a.EndDate
)
SELECT Provider, Patient, Min(StartDate) as StartDate, Max(EndEndDate) as EndDate, Min(RowNum) as RowNum
FROM BeforeAndAfterDates
WHERE DateDiffInDays=0
GROUP BY Provider, Patient
UNION
SELECT a.Provider, a.Patient, a.StartDate, a.EndDate, a.RowNum
FROM BeforeAndAfterDates a
LEFT JOIN BeforeAndAfterDates b ON b.EndEndDate=a.enddate
WHERE a.DateDiffInDays IS NULL AND b.RowNum IS NULL

And here is my result.

result