Sql-server – Finding the Min value for a subgroup, filtering out preceding Min values within the same parent group

sql-server-2016t-sql

How do I get the minimum value for a group (Acc, TranType), but filtering out any minimum values used in preceding rows for the Acc group. Preceding rows would be defined as Acc asc, TranType asc.

The PosCancelID should only appear once per Acc group. But the same PosCancelID could appear within another Acc Group within the data set.

So with the given data set:

Acc  | TranType | PosCancelID
100     1               2
808     1               5
808     1               4
808     2               5
808     2               4<--To be filtered from min calc as it min for (808,1)
813     2               3
813     4               3<--To be filtered from min calc as it min for (813,2)
809     1               3
809     1               4
809     2               3<--To be filtered from min calc as (809,1) uses it
809     2               4 
809     3               4<--To be filtered from min calc as (809,2) uses it

I should get:

Acc  | TranType | PosCancelID
100     1               2
808     1               4
808     2               5
813     2               3
809     1               3
809     2               4

SELECT ACC, TranType, min(maxPreceeding) as ActualCancelID 
FROM 
(
   SELECT ACC, TranType,  
          MAX(m.posCancelID) OVER (PARTITION BY m.ACC 
                                  ORDER BY m.TranType, m.posCancelID 
                                  ROWS UNBOUNDED PRECEDING) as maxPreceeding
    FROM MCancel as m
) AS x
GROUP BY ACC, TranType

The above query gives me almost what i want but isn't filtering for acc = 813. So I know there must be a better (actually applying a filter to remove previous minimum values) way.

Best Answer

Quite a hard problem. Here is a recursive solution:

WITH 
  rcte AS
  ( SELECT TOP (1)
        Acc, TranType, posCancelID,
        CAST('=' + CAST(posCancelID AS VARCHAR(20)) + '=' AS VARCHAR(MAX)) AS IDs 
    FROM
        MCancel
    ORDER BY
        Acc, TranType, posCancelID

    UNION ALL

    SELECT
        Acc, TranType, posCancelID,
        CAST(IDs + CAST(posCancelID AS VARCHAR(20)) + '=' AS VARCHAR(MAX))
    FROM 
      ( SELECT
            m.*, 
            r.IDs, 
            ROW_NUMBER() OVER (ORDER BY m.Acc, m. TranType, m.PosCancelID) AS rn
        FROM
            rcte AS r 
            JOIN MCancel AS m
                ON  (m.Acc = r.Acc AND m.TranType > r.TranType)
                OR  (m.Acc > r.Acc)
        WHERE
            r.IDs NOT LIKE ('%=' + CAST(m.posCancelID AS VARCHAR(20)) + '=%')
      ) AS mc
    WHERE
        rn = 1
  )
SELECT Acc, TranType, posCancelID
FROM rcte
ORDER BY Acc, TranType ;

The solution assumes that a posCancelID should not appear twice in the result set. If the requirements are that they should not appear twice in the same Acc group, then the solution needs a slight adjustment:

WITH rcte AS
  ( SELECT 
      Acc, TranType, posCancelID,
      CAST('=' + CAST(posCancelID AS VARCHAR(20)) + '=' AS VARCHAR(MAX)) AS IDs 
    FROM
      ( SELECT 
          Acc, TranType, posCancelID,
          ROW_NUMBER() OVER (PARTITION BY Acc ORDER BY TranType, PosCancelID) AS rnk
        FROM MCancel
      ) AS f
    WHERE rnk = 1

    UNION ALL

    SELECT
      Acc, TranType, posCancelID,
      CAST(IDs + CAST(posCancelID AS VARCHAR(20)) + '=' AS VARCHAR(MAX))
    FROM
      ( SELECT
          m.*, r.IDs, 
          ROW_NUMBER() OVER (PARTITION BY m.Acc
                             ORDER BY m.TranType, m.PosCancelID) AS rn
        FROM
          rcte AS r 
          JOIN MCancel AS m
              ON  (m.Acc = r.Acc AND m.TranType > r.TranType)
        WHERE
          r.IDs NOT LIKE ('%=' + CAST(m.posCancelID AS VARCHAR(20)) + '=%')
      ) AS mc
    WHERE rn = 1
  )
SELECT Acc, TranType, posCancelID
FROM rcte
ORDER BY Acc, TranType ;

Both are tested at dbfiddle.uk

Related Solutions

Sql-server – Select when group by contains more than one row with the same value

Your idea is right but you need HAVING COUNT(DISTINCT design) > 1

Like this:

WITH multi_design_perfs AS
(SELECT b.perf_id 
FROM perf_ticket_type b
GROUP BY b.perf_id
HAVING COUNT(DISTINCT b.design) > 1
)
SELECT m.perf_id, 
    STUFF((select ', ' + CAST(b.design AS varchar(10))
     FROM perf_ticket_type b 
     WHERE b.perf_id = m.perf_id
     ORDER BY b.design
     FOR XML PATH(''), TYPE).value('.','varchar(max)'),1,2,'')
FROM multi_design_perfs m
;

Sql-server – Minimum distance of a location; one table of points compared to a set of points SQL 2012

You may have better performance if you convert the function into a table-valued-function.

Here, I setup the test-bed:

USE tempdb;
CREATE TABLE dbo.TableA
(
    LAT DECIMAL(10,5)
    , LON DECIMAL(10,5)
);

CREATE TABLE TableB
(
    Geolocation GEOGRAPHY NOT NULL
);
GO

Here is the table-valued-function, which is basically your function except this returns a table.

CREATE FUNCTION dbo.fxn_distance
(
    @pointTableA GEOGRAPHY
)
returns table
as return 
(
    SELECT TOP 1 Distance = b.GeoLocation.STDistance(@pointTableA) 
    FROM TableB b  
    WHERE geolocation.STDistance(@pointTableA) IS NOT NULL
    ORDER BY geolocation.STDistance(@pointTableA)
)
GO

Insert a simple test-row into each of the two tables:

INSERT INTO dbo.TableA(LAT, LON)
VALUES (49.0,170.0);

INSERT INTO dbo.TableB(Geolocation)
VALUES (geography::STGeomFromText(
    'LINESTRING(-122.360 47.656, -122.343 47.656)', 4326)
);

The query that uses the inline TVF to determine the closest point:

SELECT a.*
    , d.Distance --This is the distance calculated by the TVF. 
FROM dbo.TableA a
CROSS APPLY dbo.fxn_distance(geography::STPointFromText('POINT(' + 
       CAST(A.LON AS VARCHAR(20)) + ' ' + CAST(A.LAT AS 
       VARCHAR(20)) + ')', 4326)) d ;

Best Answer

Related Solutions

Sql-server – Select when group by contains more than one row with the same value

Sql-server – Minimum distance of a location; one table of points compared to a set of points SQL 2012

Related Question