Sql-server – Which is faster, SUM(CASE) or CTE PIVOT

performancepivotquery-performancesql server

There are two types of ways to perform a PIVOT. Before SQL Server 2005, when PIVOT was introduced, most people did this:

SELECT RateID
              SUM(CASE WHEN RateItemTypeID = 1 THEN UnitPrice ELSE 0 END),
              SUM(CASE WHEN RateItemTypeID = 2 THEN UnitPrice ELSE 0 END),
              SUM(CASE WHEN RateItemTypeID = 3 THEN UnitPrice ELSE 0 END)
              FROM rate_item WHERE _WhereClause_
              GROUP BY RateID

Later, when 2005 introduced PIVOT it became this:

   SELECT RateID, [1], [2], [3]
          FROM PertinentRates -- PertinentRates is a CTE with WHERE clause applied
          PIVOT (SUM(UnitPrice) FOR RateItemTypeID IN ([1], [2], [3])) PVT)

Across SQL Server 2005, 2008 R2, 2012 and 2014 (the versions of SQL Server I've worked with that implement PIVOT), in my experience, it has always been faster than SUM(CASE) or in a few cases equally fast. Are there examples where PIVOT is slower?

I can't give the DDL because it is an example from my work. But the table is pretty simple. In the PIVOT example it is drawing from a CTE whereas the SUM(CASE) is drawing directly from the table. But the SUM(CASE) performs the same drawing from the CTE.

In my work example, the PIVOT comes back in 10 seconds while the SUM(CASE) comes back in 14. Clearly it must be doing something different under the covers. The plans are the same, 50% of total each. PIVOT converted to SUM(CASE) in the query analyzer. Yet SUM(CASE) never comes back in less than 13 seconds, and PIVOT never comes back in over 11 seconds.

I've tried running them back and forth, it doesn't matter the order they are run in. If I run them both from a cold cache they both take longer, but PIVOT is still faster, 12 vs 17 seconds. Can't reproduce on a second server, but that one is considerably better; it's 5 seconds each there with minor variations. PIVOT is a little better, but percentage-wise it doesn't have the same edge as on the first server.

The IO stats, like the query plan, is identical between the two. That is odd, I had kind of expected to see different IO stats, even though I've never looked at them for this particular example.

Best Answer

Are there examples where PIVOT is slower?

This is unlikely in simple cases. As Itzik Ben-Gan notes in his SQL Server Pro article, Pivoting Data when looking at the plan for a PIVOT query (emphasis added):

Figure 3 shows the plan for the PIVOT query. As you can see, this plan is very similar to that of the standard solution—so much so that if you look at the properties of the Aggregate operator, under Defined Values, you’ll find that SQL Server constructed CASE expressions behind the scenes:


[Expr1022] = Scalar Operator(SUM(CASE WHEN [InsideTSQL2008].[Sales].[Orders].[shipcity]=N'Barcelona' THEN [InsideTSQL2008].[Sales].[Orders].[freight] ELSE NULL END))

With this in mind, you shouldn’t expect the solution that’s based on the PIVOT operator to perform better than the standard solution. The main benefit in the PIVOT operator at the moment is that it’s less verbose.

For more advanced pivoting requirements that the (non-standard) PIVOT syntax does not directly support, workarounds are needed. These may or may not lead to worse performance compared with CASE, depending on various factors including the skill level of the implementor.

Examples of these problematic cases are covered in Itzik's article, and also well explained in Robert Sheldon's Simple Talk article, Questions About Pivoting Data in SQL Server You Were Too Shy to Ask.

My experience has been that PIVOT and Agg(CASE... generate extremely similar plans with extremely close performance characteristics when both are written optimally. My usual advice is to write queries using whatever syntax feels most natural to you, and to only try rewrites if performance is not acceptable.

Internals

The SQL Server query processor does have a built-in Pivot logical operator (LogOp_Pivot), so it is maybe not quite correct to say that SQL Server rewrites pivots to aggregates and case expressions, at least if we are talking about parsing and compilation activities that take place prior to cost-based optimization (trivial plans are not available for pivot queries).

On the other hand, it is true that the only way the optimizer can implement a query tree containing LogOp_Pivot is via the exploration rule ExpandPivot. This rule expands LogOp_Pivot into a normal grouped aggregate (LogOp_GbAgg) with associated scalar expressions. When this rule is disabled, pivot queries fail to compile.

In practice then, we can say that pivots are always (eventually) 'rewritten' as aggregates and scalar expressions before an executable plan can be produced.

Anyway, the result of the rewrite to LogOp_GbAgg is converted to the physical operators needed for an executable plan by the regular group-by aggregate implementation rules GbAggToHS (hash) or GbAggToStrm (stream).

As a side note, the reason 'manual pivots' (aggregates on case expressions) have an extra Compute Scalar below the aggregate is that the case expressions are pushed toward the leaf level of the query tree during Project Normalization (an early stage of compilation, before cost-based optimization).

Queries that use the PIVOT syntax do not have this because the expressions are not created until ExpandPivot runs during cost-based optimization. At the (earlier) time Project Normalization runs, the query tree still has LogOp_Pivot elements, so there are no projections to push down, and the case expressions typically end up inside the hash or stream aggregate.

There is typically no advantage in avoiding the Compute Scalar, since from SQL Server 2005 onward, expression evaluation is normally deferred until the result is required by a later operator. In this case, evaluation of the case expressions is deferred until the aggregate (hash or stream) requires it.