Sql-server – Remove parent and child rows if child or parent fits criteria

sql server

I have the following table:

OrderID | OldOrderID    | Action    | EntryDate | Source 
    1   |     NULL      |   Insert  | 2016-01-12|   A
    1   |     NULL      |   Remove  | 2016-01-13|   A
    2   |     NULL      |   Insert  | 2016-01-12|   B
    3   |     NULL      |   Insert  | 2016-01-12|   C
    4   |       3       |   Insert  | 2016-01-13|   C
    4   |     NULL      |   Remove  | 2016-01-14|   C

I'm trying to query all rows that dont have the Action Remove. A row can also have a child (a child can also have a child). If a child get the Action Remove it should also remove the parent and so on until it gets to the first parent.

The query I'm currently using fails to remove the parent if a child is removed.

WITH Active AS 
(
  SELECT *, rn = ROW_NUMBER() 
    OVER (PARTITION BY OrderID,Source ORDER BY EntryDate DESC)
  FROM Orders
)
SELECT *
FROM Active WHERE [Action] <> 'Remove' AND rn = 1;

One other problem is that OldOrderID can be 0 or equal to OrderID insted of NULL.

Can this problem be solved with SQL only or should I programaticly filter a simple query?

EDIT:
As I might have explained it badly some extra information:

Each row can make a child of its self ( New OrderID, OldOrderID = prevoius OrderID). Or make a "clone" of its self with a diffrent Action, but the same OrderID.

A child row will be inserted with a OrderID and an OldOrderID that refrences a OrderID from a prevoius row. If a child is updated to indicate it has been removed it will get a new row with the same OrderID, but the OldOrderID might be NULL or it might still contain a value indicating wich row is its parent.

The Source column idicates where the data came from as this data is aggreggated in a single table from 5 difrent sources. The same OrderID number can exitst on multiple sources, but it dosent mean its the same order. Each row also has a Uniqe ID, but I didnt include it in the table as it didn't seem useful.

The exception are rows that get fake children. They have OldOrderID equal to 0 or is the same as OrderID – in this case its the same as if OldOrderID was equal to NULL.

In the example table I only need the row with the OrderID 2. OrderID 1 gets a "clone" and is removed. OrderID 3 gets a child with OrderID 4. This child then gets removed. So the parent, 3, should also get removed.

I hope this clears some of the questions.

Best Answer

This is one way to do it:

IF OBJECT_ID('tempdb..#Orders') IS NOT NULL
    DROP TABLE #Orders;

CREATE TABLE #Orders (
    OrderID int,
    OldOrderID int,
    Action varchar(10),
    EntryDate date,
    Source char(1)
)

-- Your sample data, plus a couple more rows 
--to demonstrate the recursive query
INSERT INTO #Orders
VALUES 
    (    1   ,     NULL      ,   'Insert'  , '2016-01-12',   'A'),
    (    1   ,     NULL      ,   'Remove'  , '2016-01-13',   'A'),
    (    2   ,     NULL      ,   'Insert'  , '2016-01-12',   'B'),
    (    3   ,     NULL      ,   'Insert'  , '2016-01-12',   'C'),
    (    4   ,       3       ,   'Insert'  , '2016-01-13',   'C'),
    (    4   ,     NULL      ,   'Remove'  , '2016-01-14',   'C'),
    (    5   ,       4       ,   'Insert'  , '2016-01-15',   'C'),
    (    5   ,     NULL      ,   'Remove'  , '2016-01-16',   'C');


WITH SourceData AS (
    -- Crosstab to get all orders on a single row
    -- The assumption is you can get just two actions (insert and remove)
    SELECT OrderId, 
        ParentOrder = MAX(OldOrderID), 
        InsertDate = MAX(CASE WHEN Action = 'Insert' THEN EntryDate END), 
        RemoveDate = MAX(CASE WHEN Action = 'Remove' THEN EntryDate END), 
        Source = MAX(Source)
    FROM #Orders
    GROUP BY OrderId
),
OrderHierarchy AS (
    -- Recursive query to get top OrderId
    SELECT *, 
        OrderID AS TopOrderId
    FROM SourceData
    WHERE ParentOrder IS NULL

    UNION ALL

    SELECT SD.*, OH.TopOrderId
    FROM SourceData AS SD
    INNER JOIN OrderHierarchy AS OH
        ON SD.ParentOrder = OH.OrderID
),
AllOrders AS (
    -- Get MAX(RemoveData) for the order chain
    SELECT *, MaxRemovedDate = MAX(RemoveDate) OVER (PARTITION BY TopOrderId)
    FROM OrderHierarchy
)
-- Return just the rows that have no remove date in the chain
SELECT *
FROM AllOrders
WHERE MaxRemovedDate IS NULL;