Sql-server – Recursive cte avoiding loops

performancequery-performancerecursivesql serversql-server-2008-r2

I'm trying to write a recursive CTE to explore a workflow system. Unfortunately, I'm getting maximum recursion errors due to loops:

with cteActs as
 (
  select a.id as [id], aa.TASKNEXTID as [childid]
  from TASK a
  inner join TASKNEXT aa on a.id = aa.TASKPARENTID
  where a.id != aa.TASKNEXTID
  ),
 cteNext as
(
    select a.*
    from cteActs as a
    where a.id=42
    union all
    select a.*
    from cteActs as a
    inner join cteNext as n on a.id = n.childid
    )
select * 
from cteNext

The table TASK is a list of tasks whereby 42 is "Start job" for example. TASKNEXT links 42 to possible subtasks back in the TASK table. e.g. it may link 42 to 43 which may be "Find materials" e.g.

ID, name, childID
42, Start job, 43
42, Start job, 44
43, Find materials, 200
44, Report to boss, 201
201, Discuss with boss, 202
202, Receive payment, 44

I think the recursion is dying because 44>201>202>44 creates a loop the query doesn't escape from. How can I allow for this? Most of the examples/tutorials I read assume a strict parent>child relationship, where a child can never be the parent of something in its own higher tree.

What I'm trying to get to is a distinct list of TASKS that originate from flows starting at task 42, or wherever I choose.


This is iteration 2 which may work, but runs so slow:

  select a.id as [id], aa.ACTIONACTIVITYID as [childid]
   into #temp
  from TASK a
  inner joinTASKNEXT aa on a.id = aa.TASKPARENTID
  where a.id != aa.TASKNEXTID    
  create clustered index [hello] on #temp (ID ASC)
  create nonclustered index [hello2] on #temp (childid ASC)
  ;
with cteNext as
(
    select a.*, 
    cast(',' + cast(a.ID as varchar(10)) + ',' as varchar(max)) as Path,
    0 as [cyc]
    from #temp as a
    where a.id=42
    union all
    select a.*,
    n.Path + cast(a.ID as varchar(10)) + ',',
         case when n.Path like '%,'+cast(a.ID as varchar(10))+',%' 
           then 1 
           else 0 
         end as [cyc]
    from #temp as a
    inner join cteNext as n on a.id = n.childid
    where n.cyc = 0
    )

select   id, childid
from cteNext
where cyc =0

Execution plan

Best Answer

So the best result I've managed to come up with is adding the following improvements
- optimising the queried table (creating a new temporary table with suggested indexes)
- adding a path element to check I'm not revisiting an existing part of the path
- adding a depth counter as a limiter. This does however mean I'm consciously choosing not to have the full result set

  select a.id as [id], aa.ACTIONACTIVITYID as [childid]
   into #temp
  from TASK a
  inner join TASKNEXT aa on a.id = aa.TASKPARENTID
  where a.id != aa.TASKNEXTID

  create clustered index [hello] on #temp (ID ASC)
  create nonclustered index [hello2] on #temp (childid ASC)
  ;
  drop table #temp2;
with cteNext as
(
    select a.*, 
    cast(',' + cast(a.ID as varchar(10)) + ',' as varchar(max)) as Path,
    0 as [cyc], 0 as [depth]
    from #temp as a
    where a.id=42
    union all
    select a.*,
    n.Path + cast(a.ID as varchar(10)) + ',',
         case when n.Path like '%,'+cast(a.ID as varchar(10))+',%' 
           then 1 
           else 0 
         end as [cyc], n.depth+1 as [depth]
    from #temp as a
    inner join cteNext as n on a.id = n.childid
    where n.cyc = 0 and n.depth <=11
    )

select *
into #temp2
from cteNext
where cyc =0

The initial purpose of this data is "fluffy", I'm generating a graph-map out of it to show our workflows so a limited depth is fine in the first instance at least. But better answers accepted if anyone has any