Sql-server – How to write a query which finds all circular references when a table references itself

foreign keysql serversql-server-2008-r2

I have the following schema (names changed), which I cannot change:

CREATE TABLE MyTable (
    Id INT NOT NULL PRIMARY KEY,
    ParentId INT NOT NULL
);

ALTER TABLE MyTable ADD FOREIGN KEY (ParentId) REFERENCES MyTable(Id);

That is, each record is a child of another record. If a record’s ParentId is equal to its Id, then the record is considered a root node.

I want to run query which will find all circular references. For example, with the data

INSERT INTO MyTable (Id, ParentId) VALUES
    (0, 0),
    (1, 0),
    (2, 4),
    (3, 2),
    (4, 3);

the query should return

Id | Cycle
2  | 2 < 4 < 3 < 2
3  | 3 < 2 < 4 < 3
4  | 4 < 3 < 2 < 4

I wrote the following query for SQL Server 2008 R2, and I am wondering if this query can be improved:

IF OBJECT_ID(N'tempdb..#Results') IS NOT NULL DROP TABLE #Results;
CREATE TABLE #Results (Id INT, HasParentalCycle BIT, Cycle VARCHAR(MAX));

DECLARE @i INT,
    @j INT,
    @flag BIT,
    @isRoot BIT,
    @ids VARCHAR(MAX);

DECLARE MyCursor CURSOR FAST_FORWARD FOR
    SELECT Id
    FROM MyTable;

OPEN MyCursor;
FETCH NEXT FROM MyCursor INTO @i;
WHILE @@FETCH_STATUS = 0
BEGIN
    IF OBJECT_ID(N'tempdb..#Parents') IS NOT NULL DROP TABLE #Parents;
    CREATE TABLE #Parents (Id INT);

    SET @ids = NULL;
    SET @isRoot = 0;
    SET @flag = 0;
    SET @j = @i;
    INSERT INTO #Parents (Id) VALUES (@j);

    WHILE (1=1)
    BEGIN
        SELECT
            @j = ParentId,
            @isRoot = CASE WHEN ParentId = Id THEN 1 ELSE 0 END
        FROM MyTable
        WHERE Id = @j;

        IF (@isRoot = 1)
        BEGIN
            SET @flag = 0;
            BREAK;
        END        

        IF EXISTS (SELECT 1 FROM #Parents WHERE Id = @j)
        BEGIN
            INSERT INTO #Parents (Id) VALUES (@j);
            SET @flag = 1;
            SELECT @ids = COALESCE(@ids + ' < ', '') + CAST(Id AS VARCHAR) FROM #Parents;
            BREAK;
        END
        ELSE
        BEGIN
            INSERT INTO #Parents (Id) VALUES (@j);
        END        
    END

    INSERT INTO #Results (Id, HasParentalCycle, Cycle) VALUES (@i, @flag, @ids);

    FETCH NEXT FROM MyCursor INTO @i;
END
CLOSE MyCursor;
DEALLOCATE MyCursor;

SELECT Id, Cycle
FROM #Results
WHERE HasParentalCycle = 1;

Best Answer

This calls for a recursive CTE:

WITH FindRoot AS
(
    SELECT Id,ParentId, CAST(Id AS NVARCHAR(MAX)) Path
    FROM dbo.MyTable

    UNION ALL

    SELECT C.Id, P.ParentId, C.Path + N' > ' + CAST(P.Id AS NVARCHAR(MAX))
    FROM dbo.MyTable P
    JOIN FindRoot C
    ON C.ParentId = P.Id AND P.ParentId <> P.Id AND C.ParentId <> C.Id
 )
SELECT *
FROM FindRoot R
WHERE R.Id = R.ParentId 
  AND R.ParentId <> 0;

See it in action here: SQL Fiddle


Update:

Added distance to be able to exclude all self cycles (see ypercube's comment):

WITH FindRoot AS
(
    SELECT Id,ParentId, CAST(Id AS NVARCHAR(MAX)) Path, 0 Distance
    FROM dbo.MyTable

    UNION ALL

    SELECT C.Id, P.ParentId, C.Path + N' > ' + CAST(P.Id AS NVARCHAR(MAX)), C.Distance + 1
    FROM dbo.MyTable P
    JOIN FindRoot C
    ON C.ParentId = P.Id AND P.ParentId <> P.Id AND C.ParentId <> C.Id
 )
SELECT *
FROM FindRoot R
WHERE R.Id = R.ParentId 
  AND R.ParentId <> 0
  AND R.Distance > 0;

SQL Fiddle

Which one you should use depends on your requirement.