Sql-server – Will altering order of columns in GROUP BY affect performance

azure-sql-databaseperformancequery-performancesql server

I have a SQL query like this:

SELECT A, B, (CASE WHEN C=0 THEN 0 ELSE 1 END), COUNT(D)
FROM SomeTable
GROUP BY A, B, (CASE WHEN C=0 THEN 0 ELSE 1 END)

On a huge dataset the actual execution plan attributes most of the time to a node that corresponds to logical operation Aggregate and physical operation HashMatch.

Will changing the order in which the columns and conditions are listed in the GROUP BY list affect performance?

Best Answer

Despite what MSDN documentation says, no, it doesn't matter for GROUP BY queries.

You can test it here, at: SQL-Fiddle test (SQL-Server 2012)

CREATE TABLE test
( id INT IDENTITY(1,1)
, a  INT NOT NULL
, b  INT NOT NULL
, c  INT NOT NULL
, d  INT NOT NULL
, PRIMARY KEY (id)
) ;

CREATE NONCLUSTERED INDEX a_b_c_include_d_index
    ON test (a, b, c)
    INCLUDE (d) ;

INSERT INTO test
(a, b, c, d)
VALUES
 ... some 50K rows

The queries:

SELECT COUNT(*) AS num
     , MIN(cnt) AS min_count
     , MAX(cnt) AS max_count
     , MIN(min_d) AS min_min_d
     , MAX(min_d) AS max_min_d
     , MIN(max_d) AS min_max_d
     , MAX(max_d) AS max_max_d
FROM
  ( SELECT a
         , b
         , c
         , COUNT(d) AS cnt
         , MIN(d) AS min_d
         , MAX(d) AS max_d
    FROM test 
    GROUP BY a, b, c             --- or with:  `GROUP BY b, a, c`
  ) AS grp ;

produce the same execution plan:

execution plan

MICROSOFT SQL SERVER 2005 XML SHOWPLAN
SELECT
Compute Scalar
Cost: 0%
Stream Aggregate
(Aggregate)
Cost: 0%
Compute Scalar
Cost: 10%
Stream Aggregate
(Aggregate)
Cost: 10%
Index Scan
[a_b_c_include_d_index].[test]
Cost: 89%

Now if you change that Group by to:

    GROUP BY a, b, c
      WITH ROLLUP

it does make a difference and it does produce different execution plans (and different result sets of course). It still uses the index though, at least in that SQL-Fiddle test. It the mismatching order case, it gets the data from the index but it has to do an intermediate sort (to calculate the rollup values).