Sql-server – How to write a query which finds all circular references when a table references itself? (II)

foreign keysql serversql-server-2008-r2

I have the following schema (names changed), which I cannot change:

CREATE TABLE MyTable (
    Id INT NOT NULL PRIMARY KEY,
    ParentId INT NOT NULL
);

ALTER TABLE MyTable ADD FOREIGN KEY (ParentId) REFERENCES MyTable(Id);

That is, each record is a child of another record.

The root node is the record with Id equal to 0.

I want to run query which will find all circular references. For example, with the data

INSERT INTO MyTable (Id, ParentId) VALUES
    (0, 0),
    (1, 0),
    (2, 4),
    (3, 2),
    (4, 3),
    (5, 6),
    (6, 5),
    (7, 5),
    (8, 8);

the query should return

Id | Cycle
2  | 2 > 4 > 3 > 2
3  | 3 > 2 > 4 > 3
4  | 4 > 3 > 2 > 4
5  | 5 > 6 > 5
6  | 6 > 5 > 6
7  | 7 > 5 > 6 > 5
8  | 8

This is similar to my question How to write a query which finds all circular references when a table references itself? from last year, except for the following differences:

  • The root node is hard-coded to have an Id of 0.
  • Any self-cycle (8 in the example above) except for the root should be considered a cycle
  • The accepted answer runs into infinite recursion on the case 7 > 5 > 6 > 5 in the example above

I am having trouble coming up with a query to account for these. Suggestions?

Best Answer

I think you're looking for a recursive common table expression.

WITH rcte (Id, ParentId, [Path])
AS ( --- Anchor:
     SELECT Id, ParentId,
            CAST(CAST(Id AS varchar(10))+' > '+
         CAST(ParentID AS varchar(10)) AS varchar(max))
     FROM MyTable

     UNION ALL

     --- Recursion:
     SELECT t.Id, rcte.ParentId,
        CAST(CAST(t.Id AS varchar(10))+' > '+
         rcte.[Path] AS varchar(max))
     FROM rcte
     INNER JOIN MyTable AS t ON rcte.Id=t.ParentId
     WHERE rcte.ParentID!=rcte.Id)

--- Returns rows where parent is child, i.e. self-recursion:
SELECT ISNULL(b.Id, a.Id) AS Id, ISNULL(b.[Path]+SUBSTRING(a.[Path], LEN(CAST(a.Id AS varchar(10)))+1, LEN(a.[Path])), a.[Path])
FROM rcte AS a
LEFT JOIN rcte AS b ON b.ParentId=a.Id AND b.Id NOT IN (SELECT ParentId FROM MyTable)
WHERE a.Id=a.ParentId
ORDER BY a.Id
OPTION (MAXRECURSION 0);

First off, every relation is a potential anchor in the tree. Then union those anchors to their respective relations (working your way down the tree for each recursion). The recursion stops when Id is ParentId, i.e. when you've come "full circle", or when there is nothing left to recurse.

Finally, the returned resultset is filtered for only those records where Id=ParentID. You might want to add a filter to exclude the 0-member as well, not sure how you wanted it.

Edit: I've LEFT JOIN'ed rcte again (as "b") to catch any branches leading up to a self-recursion. The query still doesn't precisely return the data in the question, but perhaps it's close enough. :)

A quick intro to recursive CTEs from a blog post of mine.