Sql-server – Group by multiple columns, agregate others and select all in SQL Server

aggregategroup bypartitioningsql server

I am using SQL server and I can't seem to construct the query I want. I have a table with several columns, among them are

PARAMETER_NAME, GW_LOCATION_ID, Report_Result, DETECT_FLAG

I want a query to return

  1. a row for each unique Parameter-Location combination
  2. the maximum value of the Report_Result column for that unique Parameter-Location combination and
  3. the DETECT_FLAG value associated with the maximum value.

There are other columns in addition to DETECT_FLAG that I want to return but I think that if I can get this part worked out, I should be able to return the other columns similarly.

The query works if I group by PARAMETER_NAME and GW_LOCATION_ID and aggregate the Report_Result column (see below). However, when I add the DETECT_FLAG column, I get the error, "Column 'SLVs_Flagged.DETECT_FLAG' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause." What I want is the value of DETECT_FLAG for the row returned by MAX(Report_Result).

SELECT 
PARAMETER_NAME, GW_LOCATION_ID, MAX(Report_Result)
FROM SLVs_Flagged
GROUP BY PARAMETER_NAME, GW_LOCATION_ID
ORDER BY PARAMETER_NAME, GW_LOCATION_ID;

I have tried to do a subselect to return the DETECT_FLAG value that corresponds to MAX(Report_Result) but I try to use a WHERE condition and I get another error. Please advise how I can execute this query.

Here is a subset of the data for testing.

PARAMETER_NAME  GW_LOCATION_ID  Report_Result   DETECT_FLAG
Perchlorate CDBO-6  2.38    N
Perchlorate CDBO-6  1.45    N
Perchlorate CDV-16-02655    4   N
Perchlorate CDV-16-02655    0.537   Y
Perchlorate CDV-16-02655    4   N
Perchlorate CDV-16-02656    100 N
Perchlorate CDV-16-02656    0.394   Y
Perchlorate CDV-16-02656    4   N
Perchlorate CDV-16-02656    4   N
Perchlorate CDV-16-02657    4   N
Perchlorate CDV-16-02657    4   N
Perchlorate CDV-16-02657    4   N
Perchlorate CDV-16-02657    0.174   Y
Perchlorate CDV-16-02658    4   N
Perchlorate CDV-16-02658    4   Y
Perchlorate CDV-16-02658    0.126   Y
Perchlorate CDV-16-02658    0.0561  Y
Perchlorate CDV-16-02658    20  N
Perchlorate CDV-16-02658    4   N
Perchlorate CDV-16-02659    4   N
Nitrate as Nitrogen R-16 S4 0.003   N
Nitrate as Nitrogen R-20 S1 0.003   N
Nitrate as Nitrogen R-20 S1 0.003   N
Nitrate as Nitrogen R-20 S1 0.003   N
Nitrate as Nitrogen R-20 S2 0.003   N
Nitrate as Nitrogen R-20 S2 0.003   N
Nitrate as Nitrogen R-20 S3 0.003   N
Nitrate as Nitrogen R-20 S3 0.003   N
Nitrate as Nitrogen R-20 S3 0.003   N
Nitrate as Nitrogen R-27    0.003   N
Nitrate as Nitrogen R-31 S2 0.003   N
Nitrate as Nitrogen R-32 S3 0.003   N
Nitrate as Nitrogen R-32 S3 0.003   N
Nitrate as Nitrogen Test Well 1A    0.01    N
Nitrate as Nitrogen Test Well 1A    -0.01   N
Nitrate as Nitrogen Test Well 2 0.04    N
Nitrate as Nitrogen Test Well 2 0.01    N
Nitrate as Nitrogen Test Well 2A    0   N
Nitrate as Nitrogen Test Well 3 0.04    N
Nitrate as Nitrogen Test Well 3 0.04    N
Nitrate as Nitrogen Test Well 4 0.04    N
Nitrate as Nitrogen Test Well 4 0.04    N
Nitrate as Nitrogen Test Well 4 0.04    N
Nitrate as Nitrogen Test Well 4 0.04    N
Nitrate as Nitrogen Test Well 4 0.04    N
Nitrate as Nitrogen Test Well 4 0.04    N
Nitrate as Nitrogen Test Well 4 0.04    N
Nitrate as Nitrogen Test Well 8 0.04    N
Nitrate as Nitrogen Test Well 8 0.04    N

Best Answer

This is a greatest-n-per-group problem and there are many ways to solve it (CROSS APPLY, window functions, subquery with GROUP BY, etc). Here's a method using window functions and a CTE:

WITH ct AS
  ( SELECT *,
           rn = RANK() OVER (PARTITION BY PARAMETER_NAME, GW_LOCATION_ID
                             ORDER BY Report_Result DESC)
    FROM SLVs_Flagged
  )
SELECT PARAMETER_NAME, GW_LOCATION_ID, 
       Max_Report_Result = Report_Result,
       DETECT_FLAG
       -- more columns
FROM ct
WHERE rn = 1
ORDER BY PARAMETER_NAME, GW_LOCATION_ID ;

The query will return all tied results (if there are ties). If you want a single result for every (PARAMETER_NAME, GW_LOCATION_ID) combination, you can resolve ties by using ROW_NUMBER() instead of RANK() and modifying the ORDER BY inside the OVER (..) clause. Eg. (prefer DETECT_FLAG with N over Y):

           rn = ROW_NUMBER() OVER (PARTITION BY PARAMETER_NAME, GW_LOCATION_ID
                                   ORDER BY Report_Result DESC, DETECT_FLAG)