SQL Server – Count Items Not in Time Range

ctedaterecursivesql servert-sql

I'm attempting to count the number of items if they are not within 30 seconds of the first item in a "group". I'm having a hard time figuring this out.

So, I have this table:

WITH ADates AS (
    SELECT
          Id
        , SharedId
        , TheDateTime
    FROM (VALUES
        (CAST(1 AS int), CAST(1 AS int), CAST('2019-01-01 01:01:00.00' AS datetime2(7))),
        (2, 1, '2019-01-01 01:01:33.00'),
        (3, 1, '2019-01-01 01:02:00.00'),
        (4, 1, '2019-01-01 01:02:01.00'),
        (5, 1, '2019-01-01 01:02:04.00'),
        (6, 1, '2019-01-01 01:06:15.00'),
        (7, 2, '2019-01-01 01:06:00.00'),
        (8, 2, '2019-01-01 01:06:45.00'),
        (9, 1, '2019-01-01 01:02:31.00'),
        (10, 2, '2019-01-01 01:06:05.00'),
        (11, 2, '2019-01-01 01:06:46.00'),
    ) X (Id, SharedId, TheDateTime)
)

So, the expected result that I'm looking for is:

+==========+=======+
| SharedId | Count |
+==========+=======+
| 1        | 4     |
+----------+-------+
| 2        | 2     |
+----------+-------+

The numbers are determined by:

Count since first in new group.
Not within 30 seconds of previous group so it is a new group and count.
Don't count since it is within 30 seconds of 2.
Don't count since it is within 30 seconds of 2.
Count since not within 30 seconds of previous group (Item 2).
Count since not within 30 seconds of previous group (item 2).
Count in new group for SharedId.
Count since not within previous grouping.

I'm thinking I should be doing a Window Function for this. Just not sure how to have it rely on just the first of the group.

Best Answer

Problem Categorization

I was looking for a way to use an analytical function that keeps track of inline manipulations. A single run analytical function is only able to perform so much, but not to the extend to solve this problem. The problem with nesting analytical functions is that we loose information about our dynamic pattern.

To allow dynamic inline pattern matching, in Oracle you can use MATCH_RECOGNIZE. I had no clue how to do it in Sql Server though. Then I came across a similar problem, which got resolved using a recursive CTE.

Proposed Solution

Fiddle with solution

- SharedId    GroupStartDateTime
- 1           01/01/2019 01:01:00
- 1           01/01/2019 01:01:33
- 1           01/01/2019 01:02:04
- 1           01/01/2019 01:06:15
- 2           01/01/2019 01:06:00
- 2           01/01/2019 01:06:45 
6 rows

CteBase and CteRecursive are heavely inspired by Bogdan Sahlean's answer on this related question.

WITH CteBase
AS
(
        SELECT  v.SharedId,
                v.TheDateTime,
                ROW_NUMBER() OVER(PARTITION BY v.SharedId ORDER BY v.TheDateTime) 
                       AS RowNum
        FROM    ADates v
),  CteRecursive
AS
(
        SELECT  crt.SharedId,
                crt.TheDateTime,
                crt.TheDateTime AS GroupStartDateTime,
                crt.RowNum,
                1 AS SharedIdRowNum
        FROM    CteBase crt
        WHERE   crt.RowNum = 1
        UNION ALL
        SELECT  crt.SharedId,
                crt.TheDateTime,
                CASE 
                    WHEN DATEDIFF(SECOND, prv.GroupStartDateTime, crt.TheDateTime) <= 30 
                    THEN prv.GroupStartDateTime 
                    ELSE crt.TheDateTime 
                END,
                crt.RowNum,
                CASE 
                    WHEN DATEDIFF(SECOND, prv.GroupStartDateTime, crt.TheDateTime) <= 30 
                    THEN prv.SharedIdRowNum + 1
                    ELSE 1
                END             
        FROM    CteBase crt
        INNER JOIN CteRecursive prv ON crt.SharedId = prv.SharedId 
        AND     crt.RowNum = prv.RowNum + 1
)
SELECT SharedId, Count(*) as [COUNT] FROM (
        SELECT  r.SharedId,
                r.GroupStartDateTime
        FROM    CteRecursive r
        WHERE   r.SharedIdRowNum = 1
) X
GROUP BY SharedId;

Related Solutions

SQL Queries – Show Most Recent Purchase Records

MS Access is rather limited.

I assume that it is possible to have more than one invoice for the same date. In this case I'll pick an invoice with the highest ID.

At first we'll find maximum Invoice Date for each Food Item.

SELECT
    FPD1.[Food item ID] AS ItemID
    ,MAX(I1.[Invoice Date]) AS MaxDate
FROM
    [Food purchase data] AS FPD1
    INNER JOIN Invoices AS I1 ON I1.ID = FPD1.[Invoice ID]
GROUP BY
    FPD1.[Food item ID]

Since it is possible that there are several invoices for the found max date we'll pick one invoice with the max ID per Item

Based on the MS Access syntax of nested joins and using this example from the docs:

SELECT fields 
FROM 
  table1 INNER JOIN 
  (
      table2 INNER JOIN 
      (
          table3 INNER JOIN tablex ON table3.field3 = tablex.fieldx
      ) ON table2.field2 = table3.field3
  ) ON table1.field1 = table2.field2
;

Let's try to put it together:

SELECT
    InvoicesMaxDate.ItemID
    ,InvoicesMaxDate.MaxDate
    ,MAX(I2.ID) AS MaxInvoiceID
FROM
    (
        SELECT
            FPD1.[Food item ID] AS ItemID
            ,MAX(I1.[Invoice Date]) AS MaxDate
        FROM
            [Food purchase data] AS FPD1
            INNER JOIN Invoices AS I1 ON I1.ID = FPD1.[Invoice ID]
        GROUP BY
            FPD1.[Food item ID]
    ) AS InvoicesMaxDate INNER JOIN
    (
        [Food purchase data] AS FPD2 
        INNER JOIN Invoices AS I2 ON I2.ID = FPD2.[Invoice ID]
    ) ON
        InvoicesMaxDate.ItemID = FPD2.[Food item ID] AND
        --- you may need to put extra "ON" here as well, not sure
        InvoicesMaxDate.MaxDate = I2.[Invoice Date]
GROUP BY
    InvoicesMaxDate.ItemID
    ,InvoicesMaxDate.MaxDate

Now we have both ItemID and ID of the last Invoice for that Item. Join this to original tables to fetch other details (columns).

SELECT
    FI3.Item
    ,FI3.Item
    ,FPD3.[Price per unit]
    ,FPD3.[Purchase unit]
    ,I3.[Invoice Date]
FROM
    (
        SELECT
            InvoicesMaxDate.ItemID
            ,InvoicesMaxDate.MaxDate
            ,MAX(I2.ID) AS MaxInvoiceID
        FROM
            (
                SELECT
                    FPD1.[Food item ID] AS ItemID
                    ,MAX(I1.[Invoice Date]) AS MaxDate
                FROM
                    [Food purchase data] AS FPD1
                    INNER JOIN Invoices AS I1 ON I1.ID = FPD1.[Invoice ID]
                GROUP BY
                    FPD1.[Food item ID]
            ) AS InvoicesMaxDate INNER JOIN
            (
                [Food purchase data] AS FPD2 
                INNER JOIN Invoices AS I2 ON I2.ID = FPD2.[Invoice ID]
            ) ON
                InvoicesMaxDate.ItemID = FPD2.[Food item ID] AND
                InvoicesMaxDate.MaxDate = I2.[Invoice Date]
        GROUP BY
            InvoicesMaxDate.ItemID
            ,InvoicesMaxDate.MaxDate
    ) AS LastInvoices INNER JOIN
    (
        [Food items] AS FI3 INNER JOIN
        (
            [Food purchase data] AS FPD3
            INNER JOIN Invoices AS I3 ON I3.ID = FPD3.[Invoice ID]
        ) ON FI3.ID = FDP3.[Food item ID]
    ) ON
        LastInvoices.MaxInvoiceID = I3.ID AND
        LastInvoices.ItemID = FI3.ID

In practice I'd create a view for the first query with a single join. Then I'd create a second view that joins the first view with the tables, then the third view and so on, to avoid the nested joins or minimize them. Overall query would be easier to read.

Edit to clarify what I mean based on your final solution that you put into the question.

One last attempt to convey my message.

This is what you wrote based on my suggestions above:

SELECT
    InvoicesMaxDate.ItemID
    ,InvoicesMaxDate.MaxDate
    ,Invoices.[Invoice ID]
FROM [Food purchase data], Invoices, 
    (
        SELECT 
            [Food purchase data].[Food item ID] AS ItemID
            ,MAX(Invoices.[Invoice Date]) AS MaxDate
        FROM [Food purchase data], Invoices
        WHERE Invoices.[Invoice ID] = [Food purchase data].[Invoice ID]
        GROUP BY [Food purchase data].[Food item ID]
    )  AS InvoicesMaxDate
WHERE
    Invoices.[Invoice ID] = [Food purchase data].[Invoice ID] AND
    InvoicesMaxDate.ItemID = [Food purchase data].[Food item ID] AND 
    InvoicesMaxDate.MaxDate = Invoices.[Invoice Date]
GROUP BY InvoicesMaxDate.ItemID, InvoicesMaxDate.MaxDate, Invoices.[Invoice ID];

This is what I meant:

SELECT
    InvoicesMaxDate.ItemID
    ,InvoicesMaxDate.MaxDate
    ,MAX(Invoices.[Invoice ID]) AS [Invoice ID]
FROM [Food purchase data], Invoices, 
    (
        SELECT
            [Food purchase data].[Food item ID] AS ItemID
            ,MAX(Invoices.[Invoice Date]) AS MaxDate
        FROM [Food purchase data], Invoices
        WHERE Invoices.[Invoice ID] = [Food purchase data].[Invoice ID]
        GROUP BY [Food purchase data].[Food item ID]
    )  AS InvoicesMaxDate
WHERE
    Invoices.[Invoice ID] = [Food purchase data].[Invoice ID] AND
    InvoicesMaxDate.ItemID = [Food purchase data].[Food item ID] AND 
    InvoicesMaxDate.MaxDate = Invoices.[Invoice Date]
GROUP BY InvoicesMaxDate.ItemID, InvoicesMaxDate.MaxDate;

Do you see the difference?

The InvoicesMaxDate returns MAX Invoice Date for each Food item ID. If there are two invoices for the same Food item ID with the same MAX Invoice Date we should pick one invoice among them. This is done by grouping by InvoicesMaxDate.ItemID, InvoicesMaxDate.MaxDate. There should be no grouping by Invoices.[Invoice ID] here, because we want to pick the invoice with the maximum ID.

Once you have this query saved as a LatestInvoices view, it is used further as you correctly wrote (note, that the final query uses LatestInvoices.[Invoice ID] and LatestInvoices.ItemID, but doesn't use LatestInvoices.MaxDate):

SELECT 
    [Food items].ID as FoodItemID
    ,[Food items].Item as FoodItem
    ,[Food purchase data].[Price]
    ,[Food purchase data].[Price per unit]
    ,[Food purchase data].[Purchase unit]
    ,Invoices.[Invoice Date]
FROM [Food items], [Food purchase data], Invoices, LatestInvoices
WHERE 
    Invoices.[Invoice ID] = [Food purchase data].[Invoice ID] AND
    [Food items].ID = [Food purchase data].[Food item ID] AND
    LatestInvoices.[Invoice ID] = Invoices.[Invoice ID] AND 
    LatestInvoices.ItemID = [Food items].ID
ORDER BY [Food items].Item

As for, why your last query in the question returns several rows per Item:

SELECT 
    [Food purchase data].[Food item ID]
    , [Food purchase data].[Price per unit]
    , max(Invoices.[Invoice Date]) AS MostRecentInvoiceDate
FROM [Food purchase data], Invoices
GROUP BY [Food purchase data].[Food item ID], [Food purchase data].[Price per unit];

You are grouping here by [Food item ID] and [Price per unit], so you'll get as many rows as there are unique combinations of these two columns.

The following query would return one row per [Food item ID].

SELECT 
    [Food purchase data].[Food item ID]
    , max(Invoices.[Invoice Date]) AS MostRecentInvoiceDate
FROM [Food purchase data], Invoices
GROUP BY [Food purchase data].[Food item ID];

A side note, you really should use explicit INNER JOIN instead of ,. That syntax is 20 years old.

SELECT 
    [Food purchase data].[Food item ID]
    , max(Invoices.[Invoice Date]) AS MostRecentInvoiceDate
FROM
    [Food purchase data]
    INNER JOIN Invoices ON Invoices.ID = [Food purchase data].[Invoice ID]
GROUP BY [Food purchase data].[Food item ID];

Sql-server – Writing and reading date ranges for a million items

It looks like you're on the right path. Your datebit pattern idea is an interesting option, I'd like to hear if you make any more progress towards that design.

My suggestion is a pretty simple one. Why not split the ItemOfType out to their own tables and reference them with a surrogate id?. I know it isn't always a popular idea if you have good natural keys available, but in terms of space saving it could reduce your storage requirement a bit.

This was too much for a comment, please don't crucify me.

create table dbo.ItemOfType1 (
    ItemOfType1Id int identity(1,1) not null
  , ItemOfType1Code char(12) not null
  , ReferenceType bit not null
  , constraint pkc_ItemOfType1 primary key clustered (ItemOfType1Id)
  , constraint uq_ItemOfType1_ItemOfType1Code_ReferenceType unique nonclustered (ItemOfType1Code, ReferenceType)
);
create table dbo.ItemOfType2 (
    ItemOfType2Id int identity(1,1) not null
  , ItemOfType2Code char(7) not null
  , ReferenceType bit not null
  , constraint pkc_ItemOfType2 primary key clustered (ItemOfType2Id)
  , constraint uq_ItemOfType2_ItemOfType2Code_ReferenceType unique nonclustered (ItemOfType2Code, ReferenceType)
);
create table dbo.ItemOfType1Dates (
    ItemOfType1Id int  not null
  , ReferenceDate date not null
  , constraint pkc_ItemOfType1Dates primary key clustered (ReferenceDate,ItemOfType1Id)
  , constraint fk_ItemOfType1Dates_ItemOfType1_ItemOfType1Id foreign key (ItemOfType1Id) references dbo.ItemOfType1 (ItemOfType1Id)
);
create table dbo.ItemOfType2Dates (
    ItemOfType2Id int  not null
  , ReferenceDate date not null
  , constraint pkc_ItemOfType2Dates primary key clustered (ReferenceDate,ItemOfType2Id)
  , constraint fk_ItemOfType2Dates_ItemOfType2_ItemOfType2Id foreign key (ItemOfType2Id) references dbo.ItemOfType2 (ItemOfType2Id)
);
/*  -- date ranges alternate
create table dbo.ItemOfType1DateRanges (
    ItemOfType1Id int  not null
  , FromBusinessDate date not null
  , ToBusinessDate date not null     
  , constraint pkc_ItemOfType1DateRanges primary key clustered (FromBusinessDate,ItemOfType1Id)
  , constraint fk_ItemOfType1DateRanges_ItemOfType1_ItemOfType1Id foreign key (ItemOfType1Id) references dbo.ItemOfType1 (ItemOfType1Id)
);
create table dbo.ItemOfType2DateRanges (
    ItemOfType2Id int  not null
  , FromBusinessDate date not null
  , ToBusinessDate date not null     
  , constraint pkc_ItemOfType2DateRanges primary key clustered (FromBusinessDate,ItemOfType2Id)
  , constraint fk_ItemOfType2DateRanges_ItemOfType2_ItemOfType2Id foreign key (ItemOfType2Id) references dbo.ItemOfType2 (ItemOfType2Id)
);    
*/