Sql-server – Merging into a self referential table with identity columns

sql-server-2008-r2t-sql

I have a number of records I need to add/update in a table that represents a hierarchical tree structure. The table has three columns

Id(PK, IDENTITY, int) | Description(varchar(50)) | ParentId(FK(Id), nullable, int)

A null value in the ParentId indicates the root node of a tree.
If possible, I'd like to insert/update these records in a single operation but since the identity isn't assigned until a record is inserted I'm having trouble figuring out how to capture this so child nodes' ParentId can be populated.

So given an existing table:

Id | Description | ParentId
1  | 'Foo'       | NULL
2  | 'Bar'       | 1
3  | 'Fizz'      | 1
4  | 'Buzz'      | 3

Is it even possible to insert a new root node and it's child nodes in a single operation?

I've attempted combining MERGE with UPDATE but it's either not possible to reference the columns of INSERTED in the same operation that populates them or else I just don't know how to do it.

Best Answer

This is not a single operation, but a loop that will run n number of times where n is the number of levels in your staging hierarchy.

This uses recursive common table expressions to build the hierarchy path for both a given target table (tree) and a given staging table (staging). The first example uses views containing the recursive code, and the second uses inline ctes if views are not an option.

This solution requires that the entire ancestor path be in the staging table (i.e. from the root node up), and that the combination of parentid,description is unique for each table.

The full code is available in the demos, but here is the recursive cte view for the staging table:

create view dbo.staging_path as 
with cte as (
  select id, description, parentid
      , path = convert(varchar(512),description)
      , parent_path = convert(varchar(512),null) 
  from dbo.staging t
  where t.ParentId is null
  union all
  select c.id, c.description, c.parentid
      , path = convert(varchar(512),p.path+'|'+c.description)
      , parent_path = convert(varchar(512),isnull(p.parent_path+'|','')+p.description)
  from dbo.staging c
    inner join cte p
      on c.parentid = p.id
  )
select id, description, parentid, path, parent_path
from cte;

which returns:

+----+-------------+----------+---------------+-------------+
| id | description | parentid |     path      | parent_path |
+----+-------------+----------+---------------+-------------+
|  1 | Foo         | null     | Foo           | null        |
|  2 | Bar         | 1        | Foo|Bar       | Foo         |
|  3 | Fizz        | 1        | Foo|Fizz      | Foo         |
|  4 | Buzz        | 3        | Foo|Fizz|Buzz | Foo|Fizz    |
+----+-------------+----------+---------------+-------------+

dbfiddle.uk with two recursive views

declare @rc int = 1;

while @rc > 0
begin;
  insert into tree (description, parentid)
    select sp.description, parentid=tp.id
    from dbo.staging_path sp
      left join dbo.tree_path tp
        on sp.parent_path = tp.path
    where not exists (
      select 1
      from dbo.tree_path i
      where sp.path = i.path
      )
      and (tp.id is not null or sp.parentid is null);

  set @rc = @@rowcount;
end;

If you can not create views to simplify the code as above, you can use inline recursive ctes like so:

dbfiddle.uk without views

declare @rc int = 1;

while @rc > 0
  begin;
    with tree_path as (
      select id, description, parentid
          , path = convert(varchar(512),description)
      from dbo.tree t
      where t.ParentId is null
      union all
      select c.id, c.description, c.parentid
          , path = convert(varchar(512),p.path+'|'+c.description)
      from dbo.tree c
        inner join tree_path p
          on c.parentid = p.id
    )
    , staging_path as (
      select id, description, parentid
          , path = convert(varchar(512),description)
          , parent_path = convert(varchar(512),null)
      from dbo.staging t
      where t.ParentId is null
      union all
      select c.id, c.description, c.parentid
          , path = convert(varchar(512),p.path+'|'+c.description)
          , parent_path = convert(varchar(512),isnull(p.parent_path+'|','')+p.description)
      from dbo.staging c
        inner join staging_path p
          on c.parentid = p.id
      )
    insert into tree (description, parentid)
    select sp.description, parentid=tp.id
    from staging_path sp
      left join tree_path tp
        on sp.parent_path = tp.path
    where not exists (
      select 1
      from tree_path i
      where sp.path = i.path
      )
      and (tp.id is not null or sp.parentid is null);

    set @rc = @@rowcount;
  end;


dbfiddle.uk with recursive view for target table, inline cte for staging

Each option results in tree having the following rows (identity was set to start at 41 for this table to differentiate the id from staging):

+----+-------------+----------+
| Id | Description | ParentId |
+----+-------------+----------+
| 41 | Foo         | null     |
| 42 | Bar         | 41       |
| 43 | Fizz        | 41       |
| 44 | Buzz        | 43       |
+----+-------------+----------+