SQL Server – How to Find Records with Disjoint Dates

sql server

Consider the following table,

CREATE TABLE temp ( 
    [TabName] VARCHAR(255),
    [ID] VARCHAR(255),
    [AsOfDate] DATE)

  INSERT INTO temp VALUES 
    ('TAB1', 'C103', '2019-05-01'),
    ('TAB1', 'C103', '2019-06-01'),
    ('TAB2', 'C103', '2019-06-01'),
    ('TAB2', 'C103', '2019-07-01'),
    ('TAB1', 'C103', '2019-09-01'),
    ('TAB1', 'C104', '2019-06-01'),
    ('TAB1', 'C104', '2019-08-01')

TabName     ID      AsOfDate      
TAB1        C103     '2019-05-01'            
TAB1        C103     '2019-06-01'                        
TAB2        C103     '2019-06-01'                                    
TAB2        C103     '2019-07-01'     
TAB1        C103     '2019-09-01'
TAB1        C104     '2019-06-01'
TAB1        C104     '2019-08-01'
TAB1        C105     '2019-04-01'
TAB1        C105     '2019-05-01'

I am wanting to find the IDs from the table that have skipped dates. In this table, I would want to identify that ID C103 and C104 have skipped dates as they have jumped from '2019-07-01' to '2019-09-01' and '2019-06-01' to '2019-8-01' respectively.

I have found the following question asked previously Find Missing Dates in Data which I believe gives some potential clues on the approach, i.e. using CTE, however, I am unsure how to apply them to this question where the dates are not consecutive across the whole table.

Should we be looking at partitioning here?

Best Answer

You can use a recursive CTE to achieve this, in conjunction with two windowing functions (ROW_NUMBER and RANK), to produce a running total of the date difference between the current date value and the previous value (by tabname & id group). You then SELECT only those rows with a date difference higher than 1.

If you're on SQL Server 2012 or higher, there is a much simpler way. Use the LAG function to retrieve the previous value (grouped by tabname and id) and performt he DATEDIFF on that.

Examples are included below, and you can see them in action in this db<>fiddle.

LAG Example:

SELECT
  TabName,
  Id,
  AsOfDate
FROM
(
  SELECT
    TabName,
    Id,
    AsOfDate,
    ISNULL(DATEDIFF(DAY, LAG(AsOfDate) OVER (PARTITION BY TabName, Id ORDER BY AsOfDate), AsOfDate), 0) AS PrevDateDiff
  FROM Temp
) t
WHERE t.PrevDateDiff > 1
ORDER BY tabname, id, asofdate

Recursive CTE example:

;WITH CTE AS (
  SELECT
    TabName,
    Id,
    AsOfDate,
    RANK() OVER (ORDER BY TabName, Id) AS Grp,
    ROW_NUMBER() OVER (PARTITION BY TabName, Id ORDER BY AsOfDate) AS Rn
  FROM Temp
), CTE2 AS
(
  SELECT c1.TabName,
    c1.Id,
    c1.AsOfDate,
    c1.Grp,
    c1.Rn,
    0 AS DateDiff
  FROM CTE c1
  WHERE c1.Rn = 1
  UNION ALL
  SELECT c1.TabName,
    c1.Id,
    c1.AsOfDate,
    c1.Grp,
    c1.Rn,
    DATEDIFF(DAY, c2.AsOfDate, c1.AsOfDate) AS DateDiff
  FROM CTE c1
  INNER JOIN CTE2 c2 ON c2.Rn = c1.Rn - 1 AND c2.Grp = c1.Grp
  WHERE c1.Rn > 1
)

SELECT
  TabName,
  Id,
  AsOfDate
FROM CTE2
WHERE DateDiff > 1
ORDER BY tabname, id, asofdate

Related Solutions

Sql-server – Invalid data types in restore headeronly resultset

First your first questions

I would use tinyint for the BYTE(1) in this case they told us the possible values are 1 or 0. BIT may also work. You could also try BIT. But uint64 is an unsigned 64 Byte integer. BIGINT is signed, so the max value is lower. So technically speaking a DECIMAL(20,0) or greater precision would be used here. But in later versions of that same article this is a BIGINT (For SQL Server 2008 R2 and SQL Server 2012) so I am sure you are fine with BIGINT here. If you get enough disk space and time to create a database big enough to compress to a value that blows BIGINT you can test this theory out someday ;-)
No undesired behavior if you go with SMALLINT/BIGINT/DECIMAL(20,0)
I am not sure I understand your question, but I believe the answer is conversion if you are asking what I think you are asking but this is potentially just an oops
I'm not sure why those datatypes are in the documentation but you've chosen good logical approximations.

Then the last question

I hate to shove off on this one, but I'm kind of going to do that. There are a lot of great restore scripts out there on the internet for different scenarios. You haven't fully described yours so not sure I can comment on the efficiency/elegance but you are right to read the headers to determine what you do next. Some questions to ask yourself:

Are you looking at things like the date to ensure you restore the latest? Are you looking at things like full/diff/log backups and accounting for them in the restore? What purpose is this for? Restoring a dev environment? Or for a production restore? If a dev restore, I like to go more automated. If a prod restore I like to have a script that eliminates some "oops" factor from a critical production restore but not automate so much of it that it makes it easy to forget to do a critical step or do something like backup the tail of the log. I'd search for restore scripts and see what others have done, ask yourself these questions and incorporate what you like.

I also am not sure you need to know if the file is compressed or what the compressed size is. Those facts shouldn't be terribly necessary for a restore script since SQL just handles the restore of a compressed backup for you. You don't have to tell SQL it is compressed. So you may just drop those columns altogether and only take what you require from the header to perform you restore.

Find Transactions Filling Up the Version Store in SQL Server

It doesn't really make sense to track version store by session, or by transaction, or by query. If two different users are making use of the same version of a row/table, who owns it?

You can track this by object, though, which can help you narrow down which modules are causing the churn. Have a look at sys.dm_tran_top_version_generators:

USE [your database];
GO
SELECT obj = 
  QUOTENAME(OBJECT_SCHEMA_NAME(p.object_id))
  + '.' + QUOTENAME(OBJECT_NAME(p.object_id)),
  vs.aggregated_record_length_in_bytes
FROM sys.dm_tran_top_version_generators AS vs
INNER JOIN sys.partitions AS p
ON vs.rowset_id = p.hobt_id
WHERE vs.database_id = DB_ID()
AND p.index_id IN (0,1);

And on SQL Server 2008+, you can also figure out which modules reference these tables by adding sys.dm_sql_referencing_entities:

SELECT 
  obj = QUOTENAME(OBJECT_SCHEMA_NAME(p.object_id))
  + '.' + QUOTENAME(OBJECT_NAME(p.object_id)),
  referenced_by = QUOTENAME(r.referencing_schema_name)
  + '.' + QUOTENAME(r.referencing_entity_name),
  vs.aggregated_record_length_in_bytes AS size
FROM sys.dm_tran_top_version_generators AS vs
INNER JOIN sys.partitions AS p
ON vs.rowset_id = p.hobt_id
CROSS APPLY sys.dm_sql_referencing_entities
(
  QUOTENAME(OBJECT_SCHEMA_NAME(p.object_id))
  + '.' + QUOTENAME(OBJECT_NAME(p.object_id)), 'OBJECT'
) AS r
WHERE vs.database_id = DB_ID()
AND p.index_id IN (0,1)
ORDER BY size DESC, referenced_by;

This assumes that none of the version store could be created by ad hoc queries. However, it doesn't tell you which of those modules could be causing it - hopefully the naming scheme is logical and helps you narrow it down a bit.

(On 2005 you might be able to go through sysdepends and other old-style dependency views but I'm not 100% sure how reliable that would be.)

Best Answer

Related Solutions

Sql-server – Invalid data types in restore headeronly resultset

Find Transactions Filling Up the Version Store in SQL Server

Related Question