Edit: this is second attempt
Based on @Hannah Vernon's answer, here is a way to bypass the use of CTE inside an inline subquery, which is like self-joining the CTE and I presume is the reason for the poor efficiency. It uses analytic functions available only in the 2012 version of SQL-Server. Tested at SQL-Fiddle
This part can be skipped from reading, it's a copy-paste from Hannah's answer:
;With AccountHierarchy AS
(
SELECT
Root.AcctId AccountId
, Root.Name AccountName
, Root.ParentId ParentId
, 1 HierarchyLevel
, cast(Root.Acctid as varchar(4000)) IdHierarchyMatch
, cast(Root.Acctid as varchar(4000)) IdHierarchy
, cast(replace(Root.Name,'.','') as varchar(4000)) NameHierarchy
, cast(Root.Acctid as varchar(4000)) HierarchySort
, cast(Root.Name as varchar(4000)) HierarchyLabel ,
Root.CustomerCount CustomerCount
FROM
account Root
WHERE
Root.ParentID is null
UNION ALL
SELECT
Recurse.Acctid AccountId
, Recurse.Name AccountName
, Recurse.ParentId ParentId
, Root.HierarchyLevel + 1 HierarchyLevel
, CAST(CAST(Root.IdHierarchyMatch as varchar(40)) + '.'
+ cast(recurse.Acctid as varchar(40)) as varchar(4000)) IdHierarchyMatch
, cast(cast(recurse.Acctid as varchar(40)) + '.'
+ Root.IdHierarchy as varchar(4000)) IdHierarchy
, cast(replace(recurse.Name,'.','') + '.'
+ Root.NameHierarchy as varchar(4000)) NameHierarchy
, cast(Root.AccountName + '.'
+ Recurse.Name as varchar(4000)) HierarchySort
, cast(space(root.HierarchyLevel * 4)
+ Recurse.Name as varchar(4000)) HierarchyLabel
, Recurse.CustomerCount CustomerCount
FROM
account Recurse INNER JOIN
AccountHierarchy Root on Root.AccountId = Recurse.ParentId
)
Here we order the rows of the CTE using the IdHierarchyMatch
and we calculate row numbers and a running total (from the next row up to the end.)
, cte1 AS
(
SELECT
h.AccountId
, h.AccountName
, h.ParentId
, h.HierarchyLevel
, h.IdHierarchy
, h.NameHierarchy
, h.HierarchyLabel
, parsename(h.IdHierarchy,1) Acct1Id
, parsename(h.NameHierarchy,1) Acct1Name
, parsename(h.IdHierarchy,2) Acct2Id
, parsename(h.NameHierarchy,2) Acct2Name
, parsename(h.IdHierarchy,3) Acct3Id
, parsename(h.NameHierarchy,3) Acct3Name
, parsename(h.IdHierarchy,4) Acct4Id
, parsename(h.NameHierarchy,4) Acct4Name
, h.CustomerCount
, h.HierarchySort
, h.IdHierarchyMatch
, Rn = ROW_NUMBER() OVER
(ORDER BY h.IdHierarchyMatch)
, RunningCustomerCount = COALESCE(
SUM(h.CustomerCount)
OVER
(ORDER BY h.IdHierarchyMatch
ROWS BETWEEN 1 FOLLOWING
AND UNBOUNDED FOLLOWING)
, 0)
FROM
AccountHierarchy AS h
)
Then we have one more intermediate CTE where we use the previous running totals and row numbers - basically to find where the end points for the branches of the tree structure:
, cte2 AS
(
SELECT
cte1.*
, rn3 = LAST_VALUE(Rn) OVER
(PARTITION BY Acct1Id, Acct2Id, Acct3Id
ORDER BY Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rn2 = LAST_VALUE(Rn) OVER
(PARTITION BY Acct1Id, Acct2Id
ORDER BY Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rn1 = LAST_VALUE(Rn) OVER
(PARTITION BY Acct1Id
ORDER BY Acct2Id, Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rcc3 = LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id, Acct2Id, Acct3Id
ORDER BY Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rcc2 = LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id, Acct2Id
ORDER BY Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rcc1 = LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id
ORDER BY Acct2Id, Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
FROM
cte1
)
and finally we build the last part:
SELECT
hier.AccountId
, hier.AccountName
--- -- columns skipped
, hier.CustomerCount
, TotalCustomerCount = hier.CustomerCount
+ hier.RunningCustomerCount
- ca.LastRunningCustomerCount
, hier.HierarchySort
, hier.IdHierarchyMatch
FROM
cte2 hier
OUTER APPLY
( SELECT LastRunningCustomerCount, Rn
FROM
( SELECT LastRunningCustomerCount
= RunningCustomerCount, Rn
FROM (SELECT NULL a) x WHERE 4 <= HierarchyLevel
UNION ALL
SELECT rcc3, Rn3
FROM (SELECT NULL a) x WHERE 3 <= HierarchyLevel
UNION ALL
SELECT rcc2, Rn2
FROM (SELECT NULL a) x WHERE 2 <= HierarchyLevel
UNION ALL
SELECT rcc1, Rn1
FROM (SELECT NULL a) x WHERE 1 <= HierarchyLevel
) x
ORDER BY Rn
OFFSET 0 ROWS
FETCH NEXT 1 ROWS ONLY
) ca
ORDER BY
hier.HierarchySort ;
And a simplification, using the same cte1
as the code above. Test at SQL-Fiddle-2. Please note that both solutions work under the assumption that you have a maximum of four levels in your tree:
SELECT
hier.AccountId
--- -- skipping rows
, hier.CustomerCount
, TotalCustomerCount = CustomerCount
+ RunningCustomerCount
- CASE HierarchyLevel
WHEN 4 THEN RunningCustomerCount
WHEN 3 THEN LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id, Acct2Id, Acct3Id
ORDER BY Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
WHEN 2 THEN LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id, Acct2Id
ORDER BY Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
WHEN 1 THEN LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id
ORDER BY Acct2Id, Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
END
, hier.HierarchySort
, hier.IdHierarchyMatch
FROM cte1 AS hier
ORDER BY
hier.HierarchySort ;
A third approach, with only one CTE, for the recursive part and then only window aggregate functions (SUM() OVER (...)
), so it should work in any version from 2005 upwards. Test at SQL-Fiddle-3 This solution assumes, like the previous ones, that there are 4 levels maximum in the hierarchy tree:
;WITH AccountHierarchy AS
(
SELECT
AccountId = Root.AcctId
, AccountName = Root.Name
, ParentId = Root.ParentId
, HierarchyLevel = 1
, HierarchySort = CAST(Root.Acctid AS VARCHAR(4000))
, HierarchyLabel = CAST(Root.Name AS VARCHAR(4000))
, Acct1Id = CAST(Root.Acctid AS VARCHAR(4000))
, Acct2Id = CAST(NULL AS VARCHAR(4000))
, Acct3Id = CAST(NULL AS VARCHAR(4000))
, Acct4Id = CAST(NULL AS VARCHAR(4000))
, Acct1Name = CAST(Root.Name AS VARCHAR(4000))
, Acct2Name = CAST(NULL AS VARCHAR(4000))
, Acct3Name = CAST(NULL AS VARCHAR(4000))
, Acct4Name = CAST(NULL AS VARCHAR(4000))
, CustomerCount = Root.CustomerCount
FROM
account AS Root
WHERE
Root.ParentID IS NULL
UNION ALL
SELECT
Recurse.Acctid
, Recurse.Name
, Recurse.ParentId
, Root.HierarchyLevel + 1
, CAST(Root.AccountName + '.'
+ Recurse.Name AS VARCHAR(4000))
, CAST(SPACE(Root.HierarchyLevel * 4)
+ Recurse.Name AS VARCHAR(4000))
, Root.Acct1Id
, CASE WHEN Root.HierarchyLevel = 1
THEN cast(Recurse.Acctid AS VARCHAR(4000))
ELSE Root.Acct2Id
END
, CASE WHEN Root.HierarchyLevel = 2
THEN CAST(Recurse.Acctid AS VARCHAR(4000))
ELSE Root.Acct3Id
END
, CASE WHEN Root.HierarchyLevel = 3
THEN CAST(Recurse.Acctid AS VARCHAR(4000))
ELSE Root.Acct4Id
END
, cast(Root.AccountName as varchar(4000))
, CASE WHEN Root.HierarchyLevel = 1
THEN CAST(Recurse.Name AS VARCHAR(4000))
ELSE Root.Acct2Name
END
, CASE WHEN Root.HierarchyLevel = 2
THEN CAST(Recurse.Name AS VARCHAR(4000))
ELSE Root.Acct3Name
END
, CASE WHEN Root.HierarchyLevel = 3
THEN CAST(Recurse.Name AS VARCHAR(4000))
ELSE Root.Acct4Name
END
, Recurse.CustomerCount
FROM
account AS Recurse INNER JOIN
AccountHierarchy AS Root ON Root.AccountId = Recurse.ParentId
)
SELECT
h.AccountId
, h.AccountName
, h.ParentId
, h.HierarchyLevel
, IdHierarchy =
CAST(COALESCE(h.Acct4Id+'.','')
+ COALESCE(h.Acct3Id+'.','')
+ COALESCE(h.Acct2Id+'.','')
+ h.Acct1Id AS VARCHAR(4000))
, NameHierarchy =
CAST(COALESCE(h.Acct4Name+'.','')
+ COALESCE(h.Acct3Name+'.','')
+ COALESCE(h.Acct2Name+'.','')
+ h.Acct1Name AS VARCHAR(4000))
, h.HierarchyLabel
, h.Acct1Id
, h.Acct1Name
, h.Acct2Id
, h.Acct2Name
, h.Acct3Id
, h.Acct3Name
, h.Acct4Id
, h.Acct4Name
, h.CustomerCount
, TotalCustomerCount =
CASE h.HierarchyLevel
WHEN 4 THEN h.CustomerCount
WHEN 3 THEN SUM(h.CustomerCount) OVER
(PARTITION BY h.Acct1Id, h.Acct2Id, h.Acct3Id)
WHEN 2 THEN SUM(h.CustomerCount) OVER
(PARTITION BY Acct1Id, h.Acct2Id)
WHEN 1 THEN SUM(h.CustomerCount) OVER
(PARTITION BY h.Acct1Id)
END
, h.HierarchySort
, IdHierarchyMatch =
CAST(h.Acct1Id
+ COALESCE('.'+h.Acct2Id,'')
+ COALESCE('.'+h.Acct3Id,'')
+ COALESCE('.'+h.Acct4Id,'') AS VARCHAR(4000))
FROM
AccountHierarchy AS h
ORDER BY
h.HierarchySort ;
A 4th approach, that calculates as an intermediate CTE, the closure table of the hierarchy. Test at SQL-Fiddle-4. The benefit is that for the sums calculations, there is no resctriction on the number of levels.
;WITH AccountHierarchy AS
(
-- skipping several line, identical to the 3rd approach above
)
, ClosureTable AS
(
SELECT
AccountId = Root.AcctId
, AncestorId = Root.AcctId
, CustomerCount = Root.CustomerCount
FROM
account AS Root
UNION ALL
SELECT
Recurse.Acctid
, Root.AncestorId
, Recurse.CustomerCount
FROM
account AS Recurse INNER JOIN
ClosureTable AS Root ON Root.AccountId = Recurse.ParentId
)
, ClosureGroup AS
(
SELECT
AccountId = AncestorId
, TotalCustomerCount = SUM(CustomerCount)
FROM
ClosureTable AS a
GROUP BY
AncestorId
)
SELECT
h.AccountId
, h.AccountName
, h.ParentId
, h.HierarchyLevel
, h.HierarchyLabel
, h.CustomerCount
, cg.TotalCustomerCount
, h.HierarchySort
FROM
AccountHierarchy AS h
JOIN
ClosureGroup AS cg
ON cg.AccountId = h.AccountId
ORDER BY
h.HierarchySort ;
You're missing a PRIOR
for the second condition in the connect by
.
SQL> select * from TREETEST
start with PARENTID = 1 and PARENTTYPE = 'A'
connect by
prior NODEID = PARENTID
and prior NODETYPE = PARENTTYPE; -- note the PRIOR here too
NODEID N PARENTID P
---------- - ---------- -
2 A 1 A
3 A 1 A
Your query looks a bit odd to me, but that might very well be legit. I would have expected the first line in your sample data to be your root node, and thus a query like this:
SQL> select * from TREETEST
start with NODEID = 1 and NODETYPE = 'A'
connect by
prior NODEID = PARENTID
and prior NODETYPE = PARENTTYPE;
NODEID N PARENTID P
---------- - ---------- -
1 A
2 A 1 A
3 A 1 A
And if you didn't want the parent to show up in the result set:
SQL> select * from TREETEST
where level > 1 -- don't return the root node
start with NODEID = 1 and NODETYPE = 'A'
connect by
prior NODEID = PARENTID
and prior NODETYPE = PARENTTYPE;
NODEID N PARENTID P
---------- - ---------- -
2 A 1 A
3 A 1 A
Some references from the Hierarchical Queries documentation:
PRIOR is a unary operator and has the same precedence as the unary + and - arithmetic operators. It evaluates the immediately following expression for the parent row of the current row in a hierarchical query.
If you look at the railroad diagram there, you'll see the syntax is:
starts with condition connect by [nocycle] condition [AND condition]*
And in this case, for the connect by part, each condition is a simple comparison condition with the syntax:
expression operator expression
The PRIOR
operator doesn't apply to the whole condition, it is part of the expression(s) in the conditions. I.e. it's as if it was parsed like this:
connect by ( PRIOR(child_column) = parent_column )
(PRIOR
can also appear on the right-hand side.)
Your example query:
CONNECT BY PRIOR nodeid = PARENTID AND NODETYPE = PARENTTYPE;
is interpreted as:
CONNECT BY ( PRIOR(nodeid) = PARENTID ) AND ( NODETYPE = PARENTTYPE );
The second condition (after the AND
) applies to the type columns of the current row only, it doesn't reference the parent row at all. The PRIOR
is part of the first condition only, and doesn't somehow influence to the second.
Best Answer
The technique to use is called a recursive common table expression, or recursive CTE. The Microsoft SQL documentation is comprehensive and there are a lot of third-party blog posts to help explain.
Here's an implementation for your specific question.
This part just creates the sample data from your question. It uses a temporary table (one that uses the # symbol)
This is the value from which to start looking:
This is the recursive CTE. You do not have to call it "cte" as in this example. The initial semicolon is redundant. It ensures the previous statement is properly ended if the programmer has been lazy. It does no harm to leave it.