Sql-server – Does SQL Server Cache Aggregate Results When Duplicated Across Columns

sql serversql-server-2012

Suppose we have a table Orders containing the columns order_id, total, discount

Then we write a query similar to the following

SELECT
    COUNT(order_id) AS num_orders
    , SUM(total) / COUNT(order_id) as avg_total
    , SUM(discount) / COUNT(order_id) AS avg_discount
FROM Orders

Is the value for COUNT(order_id) Preserved across columns or re-computed (Eg, is there a performance hit)? Or is it better to determine the computed value(s) first and used those in the query, for example:

DECLARE @order_count AS INT 

SELECT 
    @order_count = COUNT(order_id) 
FROM Orders

SELECT
    @order_count AS num_orders
    , SUM(total) / @order_count as avg_total
    , SUM(discount) / @order_count AS avg_discount
FROM Orders

Note that, while writing this question I noticed that since SQL Server 2008 AVG() is supported. However, I continued this question and intend this to be more general to wanting to understand how SQL server handles Identical aggregates across columns as I do sometimes run into this in other forms.

Best Answer

SQL Server only calculates the COUNT once. You can see this by looking at the properties of the execution plan for

create table Orders(order_id int, total int, discount int)


SELECT
    COUNT(order_id) AS num_orders
    , SUM(total) / COUNT(order_id) as avg_total
    , SUM(discount) / COUNT(order_id) AS avg_discount
FROM Orders

enter image description here

The stream aggregate (1) has the following defined values

[Expr1008] = Scalar Operator(COUNT([tempdb].[dbo].[Orders].[order_id])), 
[Expr1009] = Scalar Operator(COUNT_BIG([tempdb].[dbo].[Orders].[total])), 
[Expr1010] = Scalar Operator(SUM([tempdb].[dbo].[Orders].[total])), 
[Expr1011] = Scalar Operator(COUNT_BIG([tempdb].[dbo].[Orders].[discount])), 
[Expr1012] = Scalar Operator(SUM([tempdb].[dbo].[Orders].[discount]))

Expr1008 is the calculation of the COUNT that you ask about.

There are some other COUNT aggregates for the other two columns. These are needed because the correct result for SUM(total) (for example) if COUNT(total) is 0 should be NULL.

This is carried out by the next compute scalar along (2). This also converts the COUNT result (Expr1008) from bigint to int and labels that as Expr1003

[Expr1003] = Scalar Operator(CONVERT_IMPLICIT(int,[Expr1008],0)),
[Expr1004] = Scalar Operator(CASE WHEN [Expr1009]=(0) THEN NULL ELSE [Expr1010] END), 
[Expr1005] = Scalar Operator(CASE WHEN [Expr1011]=(0) THEN NULL ELSE [Expr1012] END)

Finally the left most compute scalar (3) uses Expr1003 in the division operation...

[Expr1006] = Scalar Operator([Expr1004]/[Expr1003]), 
[Expr1007] = Scalar Operator([Expr1005]/[Expr1003])

... and outputs columns Expr1003, Expr1006, Expr1007 as the final result

PS: AVG has been supported much longer than SQL Server 2008. I imagine it has likely been available in the beginning. However it does not have the same semantics as your rewrite in the presence of NULLs anyway.

I assume order_id is the primary key and therefore not nullable but for a table with 10 orders and two NOT NULL total values of 2 and 4 then AVG(total) would be 3 but SUM(total) / COUNT(order_id) would be 0.6 (or 0 once integer division is taken into account).