GROUP BY using columns not in the SELECT list – when is this practical, elegant or powerful

group by

TL;DR – I would like examples of where using a GROUP BY (example shown below) using columns not in the SELECT list can be used to resolve SQL challenges in a [practical | elegant | powerful] way. I mean in a general way – the example demonstrated below is interesting because it demonstrates the principle (but it doesn't work!). I want working examples where use of this technique can be used to achieve something "significant".

Following up to this question, which required the counting of routes from point_x to point_y and vice-versa, taking routes between the same points to be equivalent – i.e. A -> B is the same for the count purposes as B -> A. A working general solution to that question is given there.

However, one poster (SQLRaptor) showed a solution which solved the problem for the (presumably) subset of data shown in the question, but this poster also said that their solution wasn't general and asked the OP could they see why? SQLRaptor also said this was an example of a GROUP BY using columns not in the SELECT list and that this could be an elegant (powerful) solution to some SQL challenges!

SQLRaptor's solution (using my own notation from answer) was:

SELECT  MIN(origin) AS point_1, 
        MAX(destination) AS point_2,
        COUNT(*) AS journey_count
FROM route
GROUP BY ASCII(origin) * ASCII(destination)
ORDER BY point_1, point_2 

I thought the non-generality might be something to do with the multiplication (duplicates), but had no concrete proof. This transpired to be the case – see my proof here and here.

Unfortunately, this example of using a GROUP BY using columns not in the SELECT list doesn't work very well – what I would like are examples of where this technique can work well?

Best Answer

I have used this technique (grouping on a column not used in the SELECT list) to combine an ordered list of values (which has gaps) into ranges of contiguous values.

First I do a rownum() on the ordered set of values. Then we diff the rownum() and the value. By grouping on the diff, we get the ranges.

CREATE TABLE #TMP (ID INT)

INSERT INTO #TMP 
SELECT 1 UNION 
SELECT 2 UNION 
SELECT 3 UNION 
SELECT 4 UNION 
SELECT 6 UNION 
SELECT 8 UNION 
SELECT 9 UNION 
SELECT 10 UNION 
SELECT 15 UNION 
SELECT 16 UNION 
SELECT 17 UNION 
SELECT 18 UNION 
SELECT 19 UNION 
SELECT 20



WITH RN AS 
(
    SELECT 
        ROW_NUMBER() OVER (ORDER BY ID) AS RN, ID 
    FROM 
        #TMP
),
SRC AS 
(
    SELECT 
        RN, 
        ID, 
        ID-RN DIFF 
    FROM 
        RN
)
SELECT 
    MIN(ID) RANGE_START, 
    MAX(ID) RANGE_END, 
    COUNT(*) CNT_VALUES_IN_RANGE 
FROM 
    SRC 
GROUP BY 
    DIFF