Sql-server – How to change aggregate function without duplicating SQL (by using SQL)

aggregatecteperformancequery-performancesql-server-2016stored-procedures

In SQL Server 2016 I have a scenario where data will be processed according to different aggregation functions in a large GROUP BY ROLLUP. I would like to have a stored procedure that has a parameter that specifies which aggregation function to use to describe the groupings in a way that does not risk SQL injection and takes advantage of compilation (it is a heavy stored procedure).

My thoughts are to use a collection of queries that summarize the data's groupings on a particular aggregate function. (e.g. agg.DataMin, agg.DataMedian, agg.DataWeightedAverage, and so on). Then use these with the parameter in a CTE

WITH AggData AS
(
   SELECT * FROM agg.DataMin WHERE @AggFunction = 1 
   UNION ALL
   SELECT * FROM agg.DataMedian WHERE @AggFunction = 2 
   UNION ALL
   SELECT * FROM agg.DataWeightedAverage WHERE @AggFunction = 3
)
SELECT ...

My concerns are query performance and industry best practice. The data table is of a reasonable size (2+ Gig). I will have to add many aggregate queries with some being inline table-valued functions for some leave-out aggregations.

In the above, will the queries/table-valued functions only execute when the @AggFunction matches the WHERE condition or will they all execute and filter after the results are returned? If the latter, is there a method to short-circuit the evaluation of the unneeded queries at run-time? Also, is there some standard method to perform this in SQL that I have overlooked?

Best Answer

Contradiction Detection could kick in to make sure only one of the statements is run, and in my simple test it did as long as there was a statement-level recompile hint, but why risk it? For example:

USE tempdb
GO

-- CREATE SCHEMA agg
--DROP TABLE agg.DataMin
--DROP TABLE agg.DataMedian
--DROP TABLE agg.DataWeightedAverage
--GO

CREATE TABLE agg.DataMin ( x INT PRIMARY KEY )
CREATE TABLE agg.DataMedian ( x INT PRIMARY KEY )
CREATE TABLE agg.DataWeightedAverage ( x INT PRIMARY KEY )
GO

INSERT INTO agg.DataMin ( x )
SELECT object_id FROM sys.all_objects

INSERT INTO agg.DataMedian ( x )
SELECT object_id FROM sys.all_objects WHERE type = 'P'

INSERT INTO agg.DataWeightedAverage ( x )
SELECT object_id FROM sys.all_objects WHERE type = 'X'
GO


-- Are there some situations when it wouldn't...
DECLARE @AggFunction INT = 1

;WITH AggData AS
(
   SELECT * FROM agg.DataMin WHERE @AggFunction = 1 
   UNION ALL
   SELECT * FROM agg.DataMedian WHERE @AggFunction = 2 
   UNION ALL
   SELECT * FROM agg.DataWeightedAverage WHERE @AggFunction = 3
)
SELECT *
FROM AggData
OPTION ( RECOMPILE )

My results: Recompile in action

In this simple example, only one table is scanned on the left with the recompile, and 3 tables are scanned on the right, without the recompile. The recompile hint allows the optimizer to "see" the parameter value and act accordingly. In a stored procedure where parameter sniffing would be used, a recompile would also be needed to get the same behaviour, either at statement or stored-proc level.

However I cannot say if there are no situations where contradiction detection would not occur; and you can't prove a negative. To put it another way, I cannot prove contradiction detection would always occur even with a recompile. There may be some unknown situations where even with a recompile it does not occur; excessive complexity springs to mind.

Also, there is no real advantage to using the CTE in your example, so why not keep it simple? You could just write some simple procedural SQL with IF...THEN...ELSE which would guarantee only one of your statements would fire, eg

DECLARE @AggFunction INT = 99

IF @AggFunction = 1
   SELECT * FROM agg.DataMin
ELSE IF @AggFunction = 2
   SELECT * FROM agg.DataMedian
ELSE IF @AggFunction = 3
   SELECT * FROM agg.DataWeightedAverage
ELSE
    RAISERROR( 'Unknown value for parameter @AggFunction (%i).', 16, 1, @AggFunction )

Add some parameter checking while you're at it. Hopefully this meets your requirements of guaranteeing only one statement is compiled when needed, is safe and hopefully simple to implement.

HTH