Sql-server – First and last records in a string of linked rows

sql serversql-server-2012

I have been handed an (ancient) database in SQLServer format containing rental records. These records can be modified by the customer while the rental is still in flight, if they want the drill for another 24 hours for instance. To indicate this, a new row is written to the database covering that additional period, and a pointer is left in the original record to say "this one modified me". The result is something like this:

pk   custId    prodId   overrideByPk   startDate   endDate
----------------------------------------------------------
1    1         1        0              1-1-2000    2-1-2000
2    1         1        3              1-2-2000    2-2-2000
3    1         1        4              3-2-2000    4-2-2000
4    1         1        0              4-2-2000    5-2-2000

These records represent two rental terms. The first ran for a single day in January, and the second ran from the 1st to the 5th of February. My goal is to produce output like this…

pk   custId    prodId   startDate   endDate
--------------------------------------------
1    1         1        1-1-2000    2-1-2000
2    1         1        1-2-2000    5-2-2000

It seems, based on the data I've looked at so far, that a record with a overrideByPk=0 is the last record in a string, even in a string of one row. Note that my example has separate date periods, but that's only for clarity. The customer may indeed return the item and the pick it back up the same day, so you can't look for discontinuities in the dates.

I can do this in C# code by getting all the rows with zero, and then looking for those overrideByPk's repeatedly, but there are thousands of these and I am concerned about performance and code complexity. I seem to recall solving this in SQL in the past, but can no longer remember the trick (if it existed).

Best Answer

If you are using SQL 2012 as your tag indicates, then you can do this with a recursive CTE, although it's a little different since you find the child first and then the parents. But Take a look at the sample below.

The Setup

DECLARE @Rentals TABLE
    (
    PK INT NOT NULL
    , CustID INT NOT NULL
    , ProdID INT NOT NULL
    , OverrideByPK INT NOT NULL
    , StartDate DATE NOT NULL
    , EndDate DATE NOT NULL
    );

INSERT INTO @Rentals
        ( PK
        , CustID
        , ProdID
        , OverrideByPK
        , StartDate
        , EndDate
        )
VALUES  ( 1,1,1,0,'1/1/2000','2/1/2000')      
    , ( 2,1,1,3,'1/2/2000','2/2/2000')      
    , ( 3,1,1,4,'3/2/2000','4/2/2000')      
    , ( 4,1,1,0,'4/2/2000','5/2/2000');

The Query

WITH CTE_Rentals AS
    (
    SELECT R.PK AS PK
        , R.OverrideByPK AS NextPK
        , R.PK AS OldestPK
    FROM @Rentals AS R
    WHERE R.OverrideByPK = 0
    UNION ALL
    SELECT R.PK AS LastPK
        , R.overrideByPK
        , CR.OldestPK
    FROM @Rentals AS R
        INNER JOIN CTE_Rentals AS CR ON R.OverrideByPK = CR.PK
    WHERE R.OverrideByPK <> 0
    )
, CTE_Single AS
    (
    SELECT C.OldestPK AS LastPK
        , MIN(C.PK) AS FirstPK
    FROM CTE_Rentals AS C
    GROUP BY C.OldestPK 
    )
SELECT S.FirstPK
    , FirstRental.CustID
    , FirstRental.ProdID
    , FirstRental.StartDate
    , LastRental.EndDate
FROM CTE_Single AS S
    INNER JOIN @Rentals AS FirstRental ON FirstRental.PK = S.FirstPK
    INNER JOIN @Rentals AS LastRental ON LastRental.PK = S.LastPK;