SQL Server – Return Most Recent Batch of Sequential Rows

sql serversql server 2014

Version:
Microsoft SQL Server 2014 (SP3-CU4) (KB4500181)

What i'm trying to accomplish here is pull in all the rows where there is a break of less than lets say 40 days between the next row.

So in the first example there is an uninterrupted break every month (roughly). So I would like the query to pull in all records here.

Example 1

CREATE table #Temp1
(
id Varchar(50),
DT8 int,
series int
) 

INSERT INTO #Temp1
VALUES
('147501',20181022,1),
('147502',20181115,1),
('147503',20181211,1),
('147504',20190110,1),
('147506',20190306,1),
('147505',20190205,1),
('147507',20190402,1),
('147508',20190429,1),
('147509',20190529,1),
('147510',20190603,1),
('147511',20190703,1),
('147512',20190802,1),
('177901',20190905,1),
('177902',20191001,1),
('177903',20191108,1),
('177904',20191205,1),
('190401',20200103,1),
('190402',20200205,1)

In example two there is a gap greater than 40 days between Sep 25 2019 and January 29 2020. So I would like the query to just pull in the most recent subsequent block. In this case it would just be the top record.

Example 2

INSERT INTO #Temp1
VALUES
('164701',20190605,2),
('164702',20190703,2),
('164703',20190802,2),
('164704',20190904,2),
('164705',20190925,2),
('189101',20200129,2)

I have started down this road, and was looking at using LEAD to calculate the number of days between the current and previous rows. I realize I probably need to break the years out to account for the case when moving to a new year or convert it to a real date so that I can use some sql functions to calculate the difference in days for me.

After that I wasn't sure how to go about only returning the most recent consecutive block. Thought I would ask here to see if anyone had any insight on how to accomplish this.

Best Answer

WITH 
cte1 AS 
(
SELECT series, 
       id, 
       `date`, 
       DATEDIFF(LEAD(STR_TO_DATE(`date`, '%Y%m%d')) OVER (PARTITION BY series
                                                          ORDER BY `date`),
                STR_TO_DATE(`date`, '%Y%m%d')) diff
FROM test
),
cte2 AS 
(
SELECT series, 
       id, 
       `date`, 
       COALESCE(SUM(diff > 40) OVER (PARTITION BY series
                                     ORDER BY `date` DESC), 0) grp
FROM cte1
)
SELECT series, 
       id, 
       `date` 
FROM cte2 
WHERE grp = 0
ORDER BY 1,3

fiddle