Sql-server – Finding the Min value for a subgroup, filtering out preceding Min values within the same parent group

sql-server-2016t-sql

How do I get the minimum value for a group (Acc, TranType), but filtering out any minimum values used in preceding rows for the Acc group. Preceding rows would be defined as Acc asc, TranType asc.

The PosCancelID should only appear once per Acc group. But the same PosCancelID could appear within another Acc Group within the data set.

So with the given data set:

Acc  | TranType | PosCancelID
100     1               2
808     1               5
808     1               4
808     2               5
808     2               4<--To be filtered from min calc as it min for (808,1)
813     2               3
813     4               3<--To be filtered from min calc as it min for (813,2)
809     1               3
809     1               4
809     2               3<--To be filtered from min calc as (809,1) uses it
809     2               4 
809     3               4<--To be filtered from min calc as (809,2) uses it

I should get:

Acc  | TranType | PosCancelID
100     1               2
808     1               4
808     2               5
813     2               3
809     1               3
809     2               4

SELECT ACC, TranType, min(maxPreceeding) as ActualCancelID 
FROM 
(
   SELECT ACC, TranType,  
          MAX(m.posCancelID) OVER (PARTITION BY m.ACC 
                                  ORDER BY m.TranType, m.posCancelID 
                                  ROWS UNBOUNDED PRECEDING) as maxPreceeding
    FROM MCancel as m
) AS x
GROUP BY ACC, TranType

The above query gives me almost what i want but isn't filtering for acc = 813. So I know there must be a better (actually applying a filter to remove previous minimum values) way.

Best Answer

Quite a hard problem. Here is a recursive solution:

WITH 
  rcte AS
  ( SELECT TOP (1)
        Acc, TranType, posCancelID,
        CAST('=' + CAST(posCancelID AS VARCHAR(20)) + '=' AS VARCHAR(MAX)) AS IDs 
    FROM
        MCancel
    ORDER BY
        Acc, TranType, posCancelID

    UNION ALL

    SELECT
        Acc, TranType, posCancelID,
        CAST(IDs + CAST(posCancelID AS VARCHAR(20)) + '=' AS VARCHAR(MAX))
    FROM 
      ( SELECT
            m.*, 
            r.IDs, 
            ROW_NUMBER() OVER (ORDER BY m.Acc, m. TranType, m.PosCancelID) AS rn
        FROM
            rcte AS r 
            JOIN MCancel AS m
                ON  (m.Acc = r.Acc AND m.TranType > r.TranType)
                OR  (m.Acc > r.Acc)
        WHERE
            r.IDs NOT LIKE ('%=' + CAST(m.posCancelID AS VARCHAR(20)) + '=%')
      ) AS mc
    WHERE
        rn = 1
  )
SELECT Acc, TranType, posCancelID
FROM rcte
ORDER BY Acc, TranType ;

The solution assumes that a posCancelID should not appear twice in the result set. If the requirements are that they should not appear twice in the same Acc group, then the solution needs a slight adjustment:


WITH rcte AS
  ( SELECT 
      Acc, TranType, posCancelID,
      CAST('=' + CAST(posCancelID AS VARCHAR(20)) + '=' AS VARCHAR(MAX)) AS IDs 
    FROM
      ( SELECT 
          Acc, TranType, posCancelID,
          ROW_NUMBER() OVER (PARTITION BY Acc ORDER BY TranType, PosCancelID) AS rnk
        FROM MCancel
      ) AS f
    WHERE rnk = 1

    UNION ALL

    SELECT
      Acc, TranType, posCancelID,
      CAST(IDs + CAST(posCancelID AS VARCHAR(20)) + '=' AS VARCHAR(MAX))
    FROM
      ( SELECT
          m.*, r.IDs, 
          ROW_NUMBER() OVER (PARTITION BY m.Acc
                             ORDER BY m.TranType, m.PosCancelID) AS rn
        FROM
          rcte AS r 
          JOIN MCancel AS m
              ON  (m.Acc = r.Acc AND m.TranType > r.TranType)
        WHERE
          r.IDs NOT LIKE ('%=' + CAST(m.posCancelID AS VARCHAR(20)) + '=%')
      ) AS mc
    WHERE rn = 1
  )
SELECT Acc, TranType, posCancelID
FROM rcte
ORDER BY Acc, TranType ;

Both are tested at dbfiddle.uk