WHERE vs HAVING on Non-Aggregate Columns – Advantages and Disadvantages

aggregategroup byperformancequery-performancesql servert-sql

I'm in the process of rewriting queries that no longer pull in all the required data. My question is in regard to a practice I've never seen and haven't found any question on StackExchange that specifically addresses the issue.

I know the point of the HAVING statement is to introduce conditions on aggregations, just like WHERE introduces conditions on individual rows. However, what I'm seeing in this code is HAVING being used in lieu of WHERE on queries with aggregations. The conditions in HAVING are not applied against the aggregations, but on the non-aggregated columns.

For example:

SELECT id, filedate, SUM(amount)
FROM Sales
GROUP BY id, filedate
HAVING id = 123 AND filedate = '1/1/2018'

As opposed to:

SELECT id, filedate, SUM(amount)
FROM Sales
WHERE id = 123 AND filedate = '1/1/2018'
GROUP BY id, filedate

Are there performance implications or other advantages/disadvantages to this strategy?

I haven't tried running diagnostics myself, not a priority and I'd have to do it on my own time. However, I think I may if there isn't a clear answer on this.

My concern is how the optimizer views this query. Does it aggregate all data and then restrict the result set based on the HAVING clause, or does it realize it can apply the having conditions on the individual rows since they are specifically referencing non-aggregated columns?

EDIT: For my example queries and the actual SQL I'm rewriting, the plans are identical, but the queries are of similar complexity and I'm not yet knowledgeable enough to draw conclusions from the identical plans.

Best Answer

The conditions in HAVING are not applied against the aggregations, but on the non-aggregated columns.

The problem here is in how you are describing what the HAVING clause applies to. The HAVING clause always applies to aggregated fields, which is all remaining columns post-aggregation. You are trying to show / say that the HAVING clause is not being applied to any aggregate functions, which is what they usually apply to. But in reality, the HAVING clause governs the result of that aggregate function, or, in your first example, the result of the grouping column. But in both cases, the aggregation has already been performed.

So in terms of performance (not to mention readability for others trying to update this code later), you use the WHERE clause to filter down to what will be aggregated, and then the HAVING clause to filter out what has been aggregated. And, while the result of a simple test as shown in the question masks the difference between the timing of the two (or logical placement in the sequence that the query is processed in) such that they "appear" to be doing the same thing, I would be quite surprised if it was not less efficient to aggregate a bunch of rows only to throw them out later when logically they could have been eliminated prior to storing / computing the aggregations. HOWEVER, if you do see that they execution plans are similar for this simple example, I am willing to bet that it is merely due to the optimizer seeing that it would be more efficient to make those HAVING conditions actual WHERE conditions as it rewrites the query before it executes it. But in that case, I still would advise against writing queries this way because you are making the optimizer take extra time to rewrite bad code when it should be spending that time / CPU cycles finding a more efficient plan. @DavidSpillett added (in a comment on this answer): "Furthermore, you are relying on the query planner seeing the optimisation potential, which it may not in more complex queries or if your code ends up ported to another database (or even just an older version of SQL Server)".

For what it's worth, even the Microsoft documentation for the HAVING clause stated that it acted as a WHERE clause when no GROUP BY was present. Now that the documentation is on GitHub, I was able to correct it recently via Pull Request #235: Correct and improve HAVING clause.

Best Answer

Related Solutions

SQL Server Index Tuning – Non-Clustered Index Key Columns for Variable WHERE Clause

Sql-server – Alternatives to joins using Linked Server and OPENQUERY

Related Question