SQL Server – Finding Most Recently Updated Record with Multiple Date Columns

datesql-server-2016t-sql

I have recently inherited a SQL Server 2016 database which was converted from an old MS Access db. I have a table that shows various repair dates and test dates for associated serial numbers. I would like to have a single row per serial number with all of the most recent dates shown. The table right now is showing multiple rows per serial number depending on the various dates entered into the database.

Repair and Config Date Table:

+------+-------------+-----------------+---------------+
| S/N  | REPAIR_DATE | INSPECTION_DATE | REPLACED_DATE |
+------+-------------+-----------------+---------------+
| 1001 | 2017-11-17  | 2017-11-17      | NULL          |
| 1002 | NULL        | NULL            | NULL          |
| 1002 | 2016-11-21  | 2016-11-21      | NULL          |
| 1004 | NULL        | NULL            | NULL          |
| 1004 | NULL        | 2017-03-28      | 2017-09-07    |
| 1004 | 2017-12-15  | NULL            | NULL          |
+------+-------------+-----------------+---------------+

Desired Output

+------+-------------+-----------------+---------------+
| S/N  | REPAIR_DATE | INSPECTION_DATE | REPLACED_DATE |
+------+-------------+-----------------+---------------+
| 1001 | 2017-11-17  | 2017-11-17      | NULL          |
| 1002 | 2016-11-21  | 2016-11-21      | NULL          |
| 1004 | 2017-12-15  | 2017-03-28      | 2017-09-07    |
+------+-------------+-----------------+---------------+

How can I combine the table results so only one row of serial number is showing the latest dates from each column?

Best Answer

You can get the aggregates, MAX(Date) in this case, by using GROUP BY:

_{Quoted from MS Docs:}

_{A SELECT statement clause that divides the query result into groups of rows, usually for the purpose of performing one or more aggregations on each group. The SELECT statement returns one row per group.}

SELECT
    SerialNumber,
    MAX(Repair_Date) as Repair_Date,
    MAX(Inspection_Date) as Inspection_Date,
    MAX(Replaced_Date) as Replaced_Date 
FROM
    YourTable
GROUP BY 
    SerialNumber

Related Solutions

Sql-server – the best way to get all data for a date range, plus the last event just before the range

I am going to assume that there isn't an index on the date columns, otherwise I think that the query would have been structured differently. If there is, you can probably find a better performing one than this.

The advantage of this query is that it can get all the data in one scan. The disadvantage is that it has to sort the data and join EventEmployee on the entire table. So as always, test with your own situation. This query also assumes that the MAX date is either unique or that equivalent rows would be acceptable.

USE AdventureWorks2012
GO
;
WITH Base AS (
   SELECT 
      TransactionHistory.*
      ,ProductVendor.BusinessEntityID
      ,MAX(CASE WHEN TransactionDate < '2008-08-01' THEN TransactionDate END) 
           OVER (PARTITION BY ProductVendor.BusinessEntityID) AS PreviousVendorTransaction
      ,COUNT(CASE WHEN TransactionDate >= '2008-08-01' THEN 1 END ) 
           OVER (PARTITION BY ProductVendor.BusinessEntityID) AS VendorAfterCutoff
   FROM
      Production.TransactionHistory
      -- Doesn't make the most sense, but I need a repeating relation
      INNER JOIN Purchasing.ProductVendor
         ON TransactionHistory.ProductID = ProductVendor.ProductID
),
Filtered AS (
   SELECT
      *
   FROM
      Base
   WHERE
      Base.TransactionDate >= '2008-08-01'
      OR (TransactionDate = PreviousVendorTransaction AND VendorAfterCutoff > 0)
)
SELECT DISTINCT
   TransactionID
   ,ProductID
   ,ReferenceOrderID
   ,ReferenceOrderLineID
   ,TransactionDate
   ,TransactionType
   ,Quantity
   ,ActualCost
   ,ModifiedDate
FROM
   Filtered

Edit:

Hmm, I think I may have to take back my comment on structuring it differently if there are indexes. The other suggestions that I have are probably fairly minor.

Make sure the query is using the indexes you're expecting it to. Start and End date to build temp table, end date to drive the previous event loop.
If the query to build the temp table is doing a lookup on the clustered index, it may be better to hold off and do that as part of the main query.
Try using a cte instead of a temp table. I think that a cte might be more competitive with the way that the query is structured below.
If you are returning a lot of events, it might be better to pull out the event table lookup to the main query to give the optimizer the option of doing a merge join.
I don't see a way of optimizing the previous event lookup short of an indexed view.

Here's a query that combines a few of those ideas.

SELECT
    e.[EventID]
INTO #EventTemp
FROM
    [Events] AS e
WHERE
    ( e.[EventStart] >= @StartDate AND e.[EventStart] <= @EndDate )
    OR ( e.[EventEnd] >= @StartDate AND e.[EventEnd] <= @EndDate )

;
WITH PrevEvent AS (
    SELECT
        EmpPrevEvent.[EventID]
    FROM
    (
        SELECT DISTINCT
            ee.[EmployeeID]
        FROM
            #EventTemp
            INNER JOIN [EventEmployee] AS ee ON
                #EventTemp.[EventID] = ee.[EventID]
    ) AS Emp
    CROSS APPLY (
        SELECT TOP 1
            e.[EventID]
        FROM
            [Events] AS e
            INNER JOIN [EventEmployee] AS ee ON
                e.[EventID] = ee.[EventID]
        WHERE
            ee.[EmployeeID] = Emp.[EmployeeID]
            AND e.[EventEnd] < @StartDate
        ORDER BY 
            e.[EventEnd] DESC
    ) AS EmpPrevEvent
)
SELECT
    e.[EventID],
    e.[EventStart],
    e.[EventEnd],
    e.[EventTypeID]
FROM
    [Events] AS e
WHERE
    e.EventID IN (
        SELECT EventID
        FROM #EventTemp
        UNION
        SELECT EventID
        FROM PrevEvent
    )

SQL Server – How to Update Table Based on Conditions (Overlapping Dates)

Condition 1:

WITH ord as (
    SELECT ID, CustomerID, CheckInDate, CheckOutDate
        , n = ROW_NUMBER() over(partition by [CustomerID] order by [CheckInDate], [CheckOutDate])
    FROM @data d1
), first as (
    SELECT o1.ID, o1.CustomerID, o1.CheckInDate, o1.CheckOutDate, o1.n
        , m = ROW_NUMBER() over(partition by o1.[CustomerID] order by o1.[CheckInDate], o1.[CheckOutDate])
    FROM ord o1
    INNER JOIN ord o2 ON o1.CustomerID = o2.CustomerID AND o2.n+1 = o1.n AND o1.CheckInDate > o2.CheckOutDate
), groups as (
    SELECT o.ID, o.CustomerID, nx = MIN(coalesce(f.n, 1)), n = MAX(o.n)
        , p = ROW_NUMBER() over(partition by o.CustomerID, MIN(coalesce(f.n, 1)) ORDER BY o.ID)
    FROM ord o
    LEFT JOIN first f ON o.CustomerID = f.CustomerID AND o.n < f.n
    GROUP BY o.ID, o.CustomerID
), dates as (
    SELECT g.CustomerID, g.nx, CheckInDate = MIN(o.CheckInDate)
        , CheckOutDate = CASE WHEN SUM(CASE WHEN o.CheckOutDate IS NULL THEN 1 END) IS NULL THEN MAX(o.CheckOutDate) END
    FROM groups g
    INNER JOIN ord o ON g.ID = o.ID
    GROUP BY g.nx, g.CustomerID
    HAVING COUNT(g.nx) > 1
)
SELECT o.ID, o.CustomerID
    , CheckInDate = CASE WHEN g.p = 1 THEN d.CheckInDate END
    , CheckOutDate = CASE WHEN g.p = 1 THEN d.CheckOutDate END
FROM groups g
INNER JOIN ord o ON g.ID = o.ID
INNER JOIN dates d on g.CustomerID = d.CustomerID AND g.nx = d.nx
ORDER BY ID

This query output the rows that must be updated:

ord = I first partition by CustomerID and order by CheckInDate
first = I join consecutive rows from previous CTE where the next CheckInDate does not overlap previous CheckInDate and CheckDate and I partition and order them
groups = I group by the previous partitioned number in order to know to which group a row belongs
dates = I join with the original data in order to get the first and last date for each groups. Groups with only 1 row are removed
main select output the dates for p=1 or NULL otherwise

Output:

ID  CustomerID  CheckInDate CheckOutDate
1   1           2015-03-04  NULL
3   1           NULL        NULL
4   1           NULL        NULL

Condition 2:

WITH ord as (
    SELECT ID, CustomerID, CheckInDate, CheckOutDate
        , n = ROW_NUMBER() over(partition by [CustomerID] order by [CheckInDate], [CheckOutDate])
    FROM @data d1
), first as (
    SELECT o1.ID, o1.CustomerID, o1.CheckInDate, o1.CheckOutDate, o1.n
        , m = ROW_NUMBER() over(partition by o1.[CustomerID] order by o1.[CheckInDate], o1.[CheckOutDate])
    FROM ord o1
    INNER JOIN ord o2 ON o1.CustomerID = o2.CustomerID AND o2.n+1 = o1.n AND o1.CheckInDate > o2.CheckOutDate
), groups as (
    SELECT o.ID, o.CustomerID, nx = MIN(coalesce(f.n, 1)), n = MAX(o.n)
        , p = ROW_NUMBER() over(partition by o.CustomerID, MIN(coalesce(f.n, 1)) ORDER BY o.ID)
        , last = ROW_NUMBER() over(partition by o.CustomerID, MIN(coalesce(f.n, 1)) ORDER BY o.ID DESC)
    FROM ord o
    LEFT JOIN first f ON o.CustomerID = f.CustomerID AND o.n < f.n
    GROUP BY o.ID, o.CustomerID
), dates as (
    SELECT g.CustomerID, g.nx, CheckInDate = MIN(o.CheckInDate)
        , CheckOutDate = MAX(o2.CheckOutDate)
    FROM groups g
    INNER JOIN ord o ON g.ID = o.ID
    INNER JOIN (SELECT ID, CustomerID, nx FROM groups WHERE last = 1) l ON g.CustomerID = l.CustomerID AND g.nx = l.nx
    INNER JOIN ord o2 ON l.ID = o2.ID
    GROUP BY g.nx, g.CustomerID
    HAVING COUNT(g.nx) > 1
)
SELECT o.ID, o.CustomerID
    , CheckInDate = CASE WHEN g.p = 1 THEN d.CheckInDate END
    , CheckOutDate = CASE WHEN g.p = 1 THEN d.CheckOutDate END
FROM groups g
INNER JOIN ord o ON g.ID = o.ID
INNER JOIN dates d on g.CustomerID = d.CustomerID AND g.nx = d.nx
ORDER BY ID

Output:

ID  CustomerID  CheckInDate CheckOutDate
1   1           2015-03-04  2015-05-03
3   1           NULL        NULL
4   1           NULL        NULL

For Updates, replace SELECT by UPDATE:

UPDATE g SET 
    CheckInDate = CASE WHEN g.p = 1 THEN d.CheckInDate END
    , CheckOutDate = CASE WHEN g.p = 1 THEN d.CheckOutDate END
FROM ...

Your data:

declare @data table([ID] int, [CustomerID] int, [CheckInDate] date, [CheckOutDate] date);
Insert into @data([ID], [CustomerID], [CheckInDate], [CheckOutDate])
VALUES
    (1, 1, '2015-04-02', '2015-04-05'),
    (2, 2, '2015-03-04', '2015-05-02'),
    (3, 1, '2015-04-01', NULL),
    (4, 1, '2015-03-04', '2015-05-03'),
    (5, 1, '2015-01-03', '2015-02-03')
;

It works as well with this sample:

(1, 1, '2015-04-02', '2015-04-05'),
(2, 2, '2015-03-04', '2015-05-02'),
(3, 1, '2015-04-01', NULL),
(4, 1, '2015-03-04', '2015-05-03'),
(5, 1, '2015-01-03', '2015-02-03'),
(6, 1, '2015-01-02', '2015-02-03'),
(7, 1, '2015-03-04', '2015-03-06'),
(8, 1, '2015-03-04', '2015-05-06'),
(9, 1, '2014-04-02', '2014-04-05'),
(10, 1, '2014-03-04', '2014-05-02')

If it does not work with some of your data, update Input and Output table with more relevant values.

Best Answer

Related Solutions

Sql-server – the best way to get all data for a date range, plus the last event just before the range

SQL Server – How to Update Table Based on Conditions (Overlapping Dates)

Related Question