Edit: this is second attempt
Based on @Hannah Vernon's answer, here is a way to bypass the use of CTE inside an inline subquery, which is like self-joining the CTE and I presume is the reason for the poor efficiency. It uses analytic functions available only in the 2012 version of SQL-Server. Tested at SQL-Fiddle
This part can be skipped from reading, it's a copy-paste from Hannah's answer:
;With AccountHierarchy AS
(
SELECT
Root.AcctId AccountId
, Root.Name AccountName
, Root.ParentId ParentId
, 1 HierarchyLevel
, cast(Root.Acctid as varchar(4000)) IdHierarchyMatch
, cast(Root.Acctid as varchar(4000)) IdHierarchy
, cast(replace(Root.Name,'.','') as varchar(4000)) NameHierarchy
, cast(Root.Acctid as varchar(4000)) HierarchySort
, cast(Root.Name as varchar(4000)) HierarchyLabel ,
Root.CustomerCount CustomerCount
FROM
account Root
WHERE
Root.ParentID is null
UNION ALL
SELECT
Recurse.Acctid AccountId
, Recurse.Name AccountName
, Recurse.ParentId ParentId
, Root.HierarchyLevel + 1 HierarchyLevel
, CAST(CAST(Root.IdHierarchyMatch as varchar(40)) + '.'
+ cast(recurse.Acctid as varchar(40)) as varchar(4000)) IdHierarchyMatch
, cast(cast(recurse.Acctid as varchar(40)) + '.'
+ Root.IdHierarchy as varchar(4000)) IdHierarchy
, cast(replace(recurse.Name,'.','') + '.'
+ Root.NameHierarchy as varchar(4000)) NameHierarchy
, cast(Root.AccountName + '.'
+ Recurse.Name as varchar(4000)) HierarchySort
, cast(space(root.HierarchyLevel * 4)
+ Recurse.Name as varchar(4000)) HierarchyLabel
, Recurse.CustomerCount CustomerCount
FROM
account Recurse INNER JOIN
AccountHierarchy Root on Root.AccountId = Recurse.ParentId
)
Here we order the rows of the CTE using the IdHierarchyMatch
and we calculate row numbers and a running total (from the next row up to the end.)
, cte1 AS
(
SELECT
h.AccountId
, h.AccountName
, h.ParentId
, h.HierarchyLevel
, h.IdHierarchy
, h.NameHierarchy
, h.HierarchyLabel
, parsename(h.IdHierarchy,1) Acct1Id
, parsename(h.NameHierarchy,1) Acct1Name
, parsename(h.IdHierarchy,2) Acct2Id
, parsename(h.NameHierarchy,2) Acct2Name
, parsename(h.IdHierarchy,3) Acct3Id
, parsename(h.NameHierarchy,3) Acct3Name
, parsename(h.IdHierarchy,4) Acct4Id
, parsename(h.NameHierarchy,4) Acct4Name
, h.CustomerCount
, h.HierarchySort
, h.IdHierarchyMatch
, Rn = ROW_NUMBER() OVER
(ORDER BY h.IdHierarchyMatch)
, RunningCustomerCount = COALESCE(
SUM(h.CustomerCount)
OVER
(ORDER BY h.IdHierarchyMatch
ROWS BETWEEN 1 FOLLOWING
AND UNBOUNDED FOLLOWING)
, 0)
FROM
AccountHierarchy AS h
)
Then we have one more intermediate CTE where we use the previous running totals and row numbers - basically to find where the end points for the branches of the tree structure:
, cte2 AS
(
SELECT
cte1.*
, rn3 = LAST_VALUE(Rn) OVER
(PARTITION BY Acct1Id, Acct2Id, Acct3Id
ORDER BY Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rn2 = LAST_VALUE(Rn) OVER
(PARTITION BY Acct1Id, Acct2Id
ORDER BY Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rn1 = LAST_VALUE(Rn) OVER
(PARTITION BY Acct1Id
ORDER BY Acct2Id, Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rcc3 = LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id, Acct2Id, Acct3Id
ORDER BY Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rcc2 = LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id, Acct2Id
ORDER BY Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rcc1 = LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id
ORDER BY Acct2Id, Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
FROM
cte1
)
and finally we build the last part:
SELECT
hier.AccountId
, hier.AccountName
--- -- columns skipped
, hier.CustomerCount
, TotalCustomerCount = hier.CustomerCount
+ hier.RunningCustomerCount
- ca.LastRunningCustomerCount
, hier.HierarchySort
, hier.IdHierarchyMatch
FROM
cte2 hier
OUTER APPLY
( SELECT LastRunningCustomerCount, Rn
FROM
( SELECT LastRunningCustomerCount
= RunningCustomerCount, Rn
FROM (SELECT NULL a) x WHERE 4 <= HierarchyLevel
UNION ALL
SELECT rcc3, Rn3
FROM (SELECT NULL a) x WHERE 3 <= HierarchyLevel
UNION ALL
SELECT rcc2, Rn2
FROM (SELECT NULL a) x WHERE 2 <= HierarchyLevel
UNION ALL
SELECT rcc1, Rn1
FROM (SELECT NULL a) x WHERE 1 <= HierarchyLevel
) x
ORDER BY Rn
OFFSET 0 ROWS
FETCH NEXT 1 ROWS ONLY
) ca
ORDER BY
hier.HierarchySort ;
And a simplification, using the same cte1
as the code above. Test at SQL-Fiddle-2. Please note that both solutions work under the assumption that you have a maximum of four levels in your tree:
SELECT
hier.AccountId
--- -- skipping rows
, hier.CustomerCount
, TotalCustomerCount = CustomerCount
+ RunningCustomerCount
- CASE HierarchyLevel
WHEN 4 THEN RunningCustomerCount
WHEN 3 THEN LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id, Acct2Id, Acct3Id
ORDER BY Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
WHEN 2 THEN LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id, Acct2Id
ORDER BY Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
WHEN 1 THEN LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id
ORDER BY Acct2Id, Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
END
, hier.HierarchySort
, hier.IdHierarchyMatch
FROM cte1 AS hier
ORDER BY
hier.HierarchySort ;
A third approach, with only one CTE, for the recursive part and then only window aggregate functions (SUM() OVER (...)
), so it should work in any version from 2005 upwards. Test at SQL-Fiddle-3 This solution assumes, like the previous ones, that there are 4 levels maximum in the hierarchy tree:
;WITH AccountHierarchy AS
(
SELECT
AccountId = Root.AcctId
, AccountName = Root.Name
, ParentId = Root.ParentId
, HierarchyLevel = 1
, HierarchySort = CAST(Root.Acctid AS VARCHAR(4000))
, HierarchyLabel = CAST(Root.Name AS VARCHAR(4000))
, Acct1Id = CAST(Root.Acctid AS VARCHAR(4000))
, Acct2Id = CAST(NULL AS VARCHAR(4000))
, Acct3Id = CAST(NULL AS VARCHAR(4000))
, Acct4Id = CAST(NULL AS VARCHAR(4000))
, Acct1Name = CAST(Root.Name AS VARCHAR(4000))
, Acct2Name = CAST(NULL AS VARCHAR(4000))
, Acct3Name = CAST(NULL AS VARCHAR(4000))
, Acct4Name = CAST(NULL AS VARCHAR(4000))
, CustomerCount = Root.CustomerCount
FROM
account AS Root
WHERE
Root.ParentID IS NULL
UNION ALL
SELECT
Recurse.Acctid
, Recurse.Name
, Recurse.ParentId
, Root.HierarchyLevel + 1
, CAST(Root.AccountName + '.'
+ Recurse.Name AS VARCHAR(4000))
, CAST(SPACE(Root.HierarchyLevel * 4)
+ Recurse.Name AS VARCHAR(4000))
, Root.Acct1Id
, CASE WHEN Root.HierarchyLevel = 1
THEN cast(Recurse.Acctid AS VARCHAR(4000))
ELSE Root.Acct2Id
END
, CASE WHEN Root.HierarchyLevel = 2
THEN CAST(Recurse.Acctid AS VARCHAR(4000))
ELSE Root.Acct3Id
END
, CASE WHEN Root.HierarchyLevel = 3
THEN CAST(Recurse.Acctid AS VARCHAR(4000))
ELSE Root.Acct4Id
END
, cast(Root.AccountName as varchar(4000))
, CASE WHEN Root.HierarchyLevel = 1
THEN CAST(Recurse.Name AS VARCHAR(4000))
ELSE Root.Acct2Name
END
, CASE WHEN Root.HierarchyLevel = 2
THEN CAST(Recurse.Name AS VARCHAR(4000))
ELSE Root.Acct3Name
END
, CASE WHEN Root.HierarchyLevel = 3
THEN CAST(Recurse.Name AS VARCHAR(4000))
ELSE Root.Acct4Name
END
, Recurse.CustomerCount
FROM
account AS Recurse INNER JOIN
AccountHierarchy AS Root ON Root.AccountId = Recurse.ParentId
)
SELECT
h.AccountId
, h.AccountName
, h.ParentId
, h.HierarchyLevel
, IdHierarchy =
CAST(COALESCE(h.Acct4Id+'.','')
+ COALESCE(h.Acct3Id+'.','')
+ COALESCE(h.Acct2Id+'.','')
+ h.Acct1Id AS VARCHAR(4000))
, NameHierarchy =
CAST(COALESCE(h.Acct4Name+'.','')
+ COALESCE(h.Acct3Name+'.','')
+ COALESCE(h.Acct2Name+'.','')
+ h.Acct1Name AS VARCHAR(4000))
, h.HierarchyLabel
, h.Acct1Id
, h.Acct1Name
, h.Acct2Id
, h.Acct2Name
, h.Acct3Id
, h.Acct3Name
, h.Acct4Id
, h.Acct4Name
, h.CustomerCount
, TotalCustomerCount =
CASE h.HierarchyLevel
WHEN 4 THEN h.CustomerCount
WHEN 3 THEN SUM(h.CustomerCount) OVER
(PARTITION BY h.Acct1Id, h.Acct2Id, h.Acct3Id)
WHEN 2 THEN SUM(h.CustomerCount) OVER
(PARTITION BY Acct1Id, h.Acct2Id)
WHEN 1 THEN SUM(h.CustomerCount) OVER
(PARTITION BY h.Acct1Id)
END
, h.HierarchySort
, IdHierarchyMatch =
CAST(h.Acct1Id
+ COALESCE('.'+h.Acct2Id,'')
+ COALESCE('.'+h.Acct3Id,'')
+ COALESCE('.'+h.Acct4Id,'') AS VARCHAR(4000))
FROM
AccountHierarchy AS h
ORDER BY
h.HierarchySort ;
A 4th approach, that calculates as an intermediate CTE, the closure table of the hierarchy. Test at SQL-Fiddle-4. The benefit is that for the sums calculations, there is no resctriction on the number of levels.
;WITH AccountHierarchy AS
(
-- skipping several line, identical to the 3rd approach above
)
, ClosureTable AS
(
SELECT
AccountId = Root.AcctId
, AncestorId = Root.AcctId
, CustomerCount = Root.CustomerCount
FROM
account AS Root
UNION ALL
SELECT
Recurse.Acctid
, Root.AncestorId
, Recurse.CustomerCount
FROM
account AS Recurse INNER JOIN
ClosureTable AS Root ON Root.AccountId = Recurse.ParentId
)
, ClosureGroup AS
(
SELECT
AccountId = AncestorId
, TotalCustomerCount = SUM(CustomerCount)
FROM
ClosureTable AS a
GROUP BY
AncestorId
)
SELECT
h.AccountId
, h.AccountName
, h.ParentId
, h.HierarchyLevel
, h.HierarchyLabel
, h.CustomerCount
, cg.TotalCustomerCount
, h.HierarchySort
FROM
AccountHierarchy AS h
JOIN
ClosureGroup AS cg
ON cg.AccountId = h.AccountId
ORDER BY
h.HierarchySort ;
Best Answer
This is unlikely in simple cases. As Itzik Ben-Gan notes in his SQL Server Pro article, Pivoting Data when looking at the plan for a
PIVOT
query (emphasis added):For more advanced pivoting requirements that the (non-standard)
PIVOT
syntax does not directly support, workarounds are needed. These may or may not lead to worse performance compared withCASE
, depending on various factors including the skill level of the implementor.Examples of these problematic cases are covered in Itzik's article, and also well explained in Robert Sheldon's Simple Talk article, Questions About Pivoting Data in SQL Server You Were Too Shy to Ask.
My experience has been that
PIVOT
andAgg(CASE...
generate extremely similar plans with extremely close performance characteristics when both are written optimally. My usual advice is to write queries using whatever syntax feels most natural to you, and to only try rewrites if performance is not acceptable.Internals
The SQL Server query processor does have a built-in Pivot logical operator (
LogOp_Pivot
), so it is maybe not quite correct to say that SQL Server rewrites pivots to aggregates and case expressions, at least if we are talking about parsing and compilation activities that take place prior to cost-based optimization (trivial plans are not available for pivot queries).On the other hand, it is true that the only way the optimizer can implement a query tree containing
LogOp_Pivot
is via the exploration ruleExpandPivot
. This rule expandsLogOp_Pivot
into a normal grouped aggregate (LogOp_GbAgg
) with associated scalar expressions. When this rule is disabled, pivot queries fail to compile.In practice then, we can say that pivots are always (eventually) 'rewritten' as aggregates and scalar expressions before an executable plan can be produced.
Anyway, the result of the rewrite to
LogOp_GbAgg
is converted to the physical operators needed for an executable plan by the regular group-by aggregate implementation rulesGbAggToHS
(hash) orGbAggToStrm
(stream).As a side note, the reason 'manual pivots' (aggregates on case expressions) have an extra Compute Scalar below the aggregate is that the case expressions are pushed toward the leaf level of the query tree during Project Normalization (an early stage of compilation, before cost-based optimization).
Queries that use the
PIVOT
syntax do not have this because the expressions are not created untilExpandPivot
runs during cost-based optimization. At the (earlier) time Project Normalization runs, the query tree still hasLogOp_Pivot
elements, so there are no projections to push down, and the case expressions typically end up inside the hash or stream aggregate.There is typically no advantage in avoiding the Compute Scalar, since from SQL Server 2005 onward, expression evaluation is normally deferred until the result is required by a later operator. In this case, evaluation of the case expressions is deferred until the aggregate (hash or stream) requires it.