SQL Server – How to Find Records with Disjoint Dates

sql server

Consider the following table,

CREATE TABLE temp ( 
    [TabName] VARCHAR(255),
    [ID] VARCHAR(255),
    [AsOfDate] DATE)

  INSERT INTO temp VALUES 
    ('TAB1', 'C103', '2019-05-01'),
    ('TAB1', 'C103', '2019-06-01'),
    ('TAB2', 'C103', '2019-06-01'),
    ('TAB2', 'C103', '2019-07-01'),
    ('TAB1', 'C103', '2019-09-01'),
    ('TAB1', 'C104', '2019-06-01'),
    ('TAB1', 'C104', '2019-08-01')
TabName     ID      AsOfDate      
TAB1        C103     '2019-05-01'            
TAB1        C103     '2019-06-01'                        
TAB2        C103     '2019-06-01'                                    
TAB2        C103     '2019-07-01'     
TAB1        C103     '2019-09-01'
TAB1        C104     '2019-06-01'
TAB1        C104     '2019-08-01'
TAB1        C105     '2019-04-01'
TAB1        C105     '2019-05-01' 

I am wanting to find the IDs from the table that have skipped dates. In this table, I would want to identify that ID C103 and C104 have skipped dates as they have jumped from '2019-07-01' to '2019-09-01' and '2019-06-01' to '2019-8-01' respectively.

I have found the following question asked previously Find Missing Dates in Data which I believe gives some potential clues on the approach, i.e. using CTE, however, I am unsure how to apply them to this question where the dates are not consecutive across the whole table.

Should we be looking at partitioning here?

Best Answer

You can use a recursive CTE to achieve this, in conjunction with two windowing functions (ROW_NUMBER and RANK), to produce a running total of the date difference between the current date value and the previous value (by tabname & id group). You then SELECT only those rows with a date difference higher than 1.

If you're on SQL Server 2012 or higher, there is a much simpler way. Use the LAG function to retrieve the previous value (grouped by tabname and id) and performt he DATEDIFF on that.

Examples are included below, and you can see them in action in this db<>fiddle.

LAG Example:

SELECT
  TabName,
  Id,
  AsOfDate
FROM
(
  SELECT
    TabName,
    Id,
    AsOfDate,
    ISNULL(DATEDIFF(DAY, LAG(AsOfDate) OVER (PARTITION BY TabName, Id ORDER BY AsOfDate), AsOfDate), 0) AS PrevDateDiff
  FROM Temp
) t
WHERE t.PrevDateDiff > 1
ORDER BY tabname, id, asofdate

Recursive CTE example:

;WITH CTE AS (
  SELECT
    TabName,
    Id,
    AsOfDate,
    RANK() OVER (ORDER BY TabName, Id) AS Grp,
    ROW_NUMBER() OVER (PARTITION BY TabName, Id ORDER BY AsOfDate) AS Rn
  FROM Temp
), CTE2 AS
(
  SELECT c1.TabName,
    c1.Id,
    c1.AsOfDate,
    c1.Grp,
    c1.Rn,
    0 AS DateDiff
  FROM CTE c1
  WHERE c1.Rn = 1
  UNION ALL
  SELECT c1.TabName,
    c1.Id,
    c1.AsOfDate,
    c1.Grp,
    c1.Rn,
    DATEDIFF(DAY, c2.AsOfDate, c1.AsOfDate) AS DateDiff
  FROM CTE c1
  INNER JOIN CTE2 c2 ON c2.Rn = c1.Rn - 1 AND c2.Grp = c1.Grp
  WHERE c1.Rn > 1
)

SELECT
  TabName,
  Id,
  AsOfDate
FROM CTE2
WHERE DateDiff > 1
ORDER BY tabname, id, asofdate