Sql-server – Using CROSS APPLY with GROUP BY and TOP 1 with duplicate data

cross-applysql server

I have a table that contains Status information about Product items over time. Each row has a Modified DATETIME. I want to get the latest Status row, using the MODIFIED field, for each ProductNumber in one query. However the crux is that the MODIFIED field may contain duplicates, so when I join back onto ProductStatus mulitple records are returned.

It will be used in a VIEW and so I must able to have WHERE clause with "ProductNumber = 123" at the end.

Sample Data:

ID | DateCreated             | ProductNumber | Modified
====================================================================
1  | 2008-09-29 00:00:00.000 | 20070098      | 2014-10-10 20:22:59.467
2  | 2008-09-29 00:00:00.000 | 20070099      | 2014-11-10 20:22:59.467
3  | 2008-12-18 09:26:58.507 | 20070099      | 2014-12-10 20:22:59.467
4  | 2008-12-18 08:47:38.343 | 20070098      | 2014-10-10 20:22:59.467
6  | 2007-12-07 00:00:00.000 | 20070098      | 2014-10-10 20:22:59.467
5  | 2007-12-07 00:00:00.000 | 20070099      | 2014-02-10 20:22:59.467
11 | 2009-03-20 14:09:52.190 | 20070098      | 2014-10-10 20:22:59.467
34 | 2009-03-20 14:18:49.383 | 20070099      | 2014-10-10 20:22:59.467

SQL to create the data:

CREATE TABLE #ProductStatus ( ID INT, DateCreated DATETIME, ProductNumber INT, Modified DATETIME )

INSERT INTO #ProductStatus VALUES (1, '2008-09-29 00:00:00.000', 20070098, '2014-10-10 20:22:59.467')
INSERT INTO #ProductStatus VALUES (2, '2008-09-29 00:00:00.000', 20070099, '2014-11-10 20:22:59.467')
INSERT INTO #ProductStatus VALUES (3, '2008-12-18 09:26:58.507', 20070099, '2014-12-10 20:22:59.467')
INSERT INTO #ProductStatus VALUES (4, '2008-12-18 08:47:38.343', 20070098, '2014-10-10 20:22:59.467')
INSERT INTO #ProductStatus VALUES (6, '2007-12-07 00:00:00.000', 20070098, '2014-10-10 20:22:59.467')
INSERT INTO #ProductStatus VALUES (5, '2007-12-07 00:00:00.000', 20070099, '2014-02-10 20:22:59.467')
INSERT INTO #ProductStatus VALUES (11, '2009-03-20 14:09:52.190', 20070098, '2014-10-10 20:22:59.467')
INSERT INTO #ProductStatus VALUES (34, '2009-03-20 14:18:49.383', 20070099, '2014-10-10 20:22:59.467')

Getting the MAX Modified when Grouping by the ProductNumber,

SELECT ProductNumber, MAX(Modified) AS MaxModified
FROM #ProductStatus
GROUP BY ProductNumber

returns

ProductNumber | MaxModified
===========================
20070098      | 2014-10-10 20:22:59.467
20070099      | 2014-12-10 20:22:59.467

Based on this sample data the final recordset I am looking for is:

ID | DateCreated             | ProductNumber | Modified
====================================================================
1  | 2008-09-29 00:00:00.000 | 20070098      | 2014-10-10 20:22:59.467
3  | 2008-12-18 09:26:58.507 | 20070099      | 2014-12-10 20:22:59.467

Using an INNER JOIN and a TOP 1 to get the ID from ProductStatus based on the MaxModified and ProductNumber,

SELECT MainProductStatus.* 
FROM #ProductStatus MainProductStatus
INNER JOIN (
    SELECT TOP 1 LatestProductStatus.ID
    FROM #ProductStatus LatestProductStatus
    INNER JOIN (
        SELECT SubLatestProductStatus.ProductNumber, MAX(SubLatestProductStatus.Modified) AS MaxModified
        FROM #ProductStatus SubLatestProductStatus
        GROUP BY ProductNumber
    ) MaxProductStatus
        ON MaxProductStatus.ProductNumber = LatestProductStatus.ProductNumber
        AND MaxProductStatus.MaxModified = LatestProductStatus.Modified
        ORDER BY LatestProductStatus.DateCreated DESC
) AS ProductStatusLatestSubQuery ON ProductStatusLatestSubQuery.ID = MainProductStatus.ID

results in this recordset, getting the TOP 1 from all of the MAX items:

ID | DateCreated             | ProductNumber | Modified
====================================================================
11 | 2009-03-20 14:09:52.190 | 20070098      | 2014-10-10 20:22:59.467

I then looked into CROSS APPLY and OUTER APPLY further but am getting mixed results, e.g.

SELECT MainProductStatus.* , ProductStatusLatestSubQuery.*
FROM #ProductStatus MainProductStatus
CROSS APPLY (
    SELECT TOP 1 ID
    FROM #ProductStatus LatestProductStatus
    INNER JOIN (
        SELECT SubLatestProductStatus.ProductNumber, MAX(SubLatestProductStatus.Modified) AS MaxModified
        FROM #ProductStatus SubLatestProductStatus
        GROUP BY ProductNumber
    ) MaxProductStatus
        ON MaxProductStatus.ProductNumber = LatestProductStatus.ProductNumber
        AND MaxProductStatus.MaxModified = LatestProductStatus.Modified
    WHERE LatestProductStatus.ID = MainProductStatus.ID
    ORDER BY LatestProductStatus.DateCreated DESC
) AS ProductStatusLatestSubQuery

returns:

ID | DateCreated             | ProductNumber | Modified                | ID
===========================================================================
1  | 2008-09-29 00:00:00.000 | 20070098      | 2014-10-10 20:22:59.467 | 1
3  | 2008-12-18 09:26:58.507 | 20070099      | 2014-12-10 20:22:59.467 | 3
4  | 2008-12-18 08:47:38.343 | 20070098      | 2014-10-10 20:22:59.467 | 4
6  | 2007-12-07 00:00:00.000 | 20070098      | 2014-10-10 20:22:59.467 | 6
11 | 2009-03-20 14:09:52.190 | 20070098      | 2014-10-10 20:22:59.467 | 11

I'm really not sure why the CROSS APPLY isn't working as I expect. Maybe I need to get the DISTINCT ProductNumber records first to avoid extra-joining, but that doesn't help me get the end data.


I've checked out any similar items I could find on here before posting. This is my first question, feedback welcomed. TIA.

Best Answer

If you're looking for MAX(Modified) field over ProductNumber, you can use ROW_NUMBER() function, and then get all rows where row number = 1.

WITH selMax AS
(
    SELECT ID, ProductNumber, DateCreated, Modified,
           ROW_NUMBER() OVER (PARTITION BY ProductNumber ORDER BY Modified DESC, 
                                                                  DateCreated DESC) RNum
    FROM   #ProductStatus
)
SELECT ID, ProductNumber, DateCreated, Modified
FROM   selMax
WHERE  RNum = 1
GO
ID | ProductNumber | DateCreated         | Modified           
-: | ------------: | :------------------ | :------------------
11 |      20070098 | 20/03/2009 14:09:52 | 10/10/2014 20:22:59
 3 |      20070099 | 18/12/2008 09:26:58 | 10/12/2014 20:22:59

Filtering by ProductNumber:

WITH selMax AS
(
    SELECT ID, ProductNumber, DateCreated, Modified,
           ROW_NUMBER() OVER (PARTITION BY ProductNumber ORDER BY Modified DESC, 
                                                                  DateCreated DESC) RNum
    FROM   #ProductStatus
    WHERE  ProductNumber = 20070098
)
SELECT ID, ProductNumber, DateCreated, Modified
FROM   selMax
WHERE  RNum = 1
GO
ID | ProductNumber | DateCreated         | Modified           
-: | ------------: | :------------------ | :------------------
11 |      20070098 | 20/03/2009 14:09:52 | 10/10/2014 20:22:59

dbfiddle here