SQL Server – Best Way to Get Root ID from Child Table in Large Tree

queryquery-performancesql servertabletree

I always used database design using this pattern;

root table with PK id1 
child1 table with PK id2 FK id1
child2 table with PK id3 FK id2
child3 table with PK id4 FK id3
etc...

so if I have to get id1 from the child5 table I need to traverse all child tables up to the root one (multiple joins)

ex;

select root.id
from root
   inner join child1 ...
   inner join child2 ...
   inner join child3 ...
   inner join child4 ...
   inner join child5 ...
where child5.id = X

is that "ok" as a design or there is a way to optimize it?

I could see a few ways to do it;

  1. bring the IDX as an FK in the table that I expect, so I could modify (add the proper FK) down the line to any child table as needed
  2. just add the FK as I create a new table, this seems wasteful, ex; child5 table would have at minimum 4 FK (id1, id2, id3, id4)

with these options I could simply do;

select child5.rootid
from child5
where child5.id = X

questions

  1. is any of my options above is a good thing to do or there is a better way of doing this?
  2. is there any name given to this pattern (so I can search online, I can't think of any)

EDIT

one example that I have, root table will be a report, in that table it will have multiple field which include a year for it.

the result set must contain all fields from childX table + the year of the report.

all tables between the root table and childX table have other fields but they should not be returned, so they are only used to go up to root and grab the year

second edit

CREATE TABLE dbo.root
        (
        root_id int NOT NULL,
        year int NOT NULL,
        field1 int NULL,
        field2 int NULL,
        fieldx int NULL
        )  ON [PRIMARY]

ALTER TABLE dbo.root ADD CONSTRAINT
        PK_root PRIMARY KEY CLUSTERED
        (
        root_id
        ) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

CREATE TABLE dbo.child1
        (
        child1_id int NOT NULL,
        root_id int NOT NULL,
        field1 int NULL,
        field2 int NULL,
        fieldx int NULL
        )  ON [PRIMARY]

ALTER TABLE dbo.child1 ADD CONSTRAINT
        PK_child1 PRIMARY KEY CLUSTERED
        (
        child1_id
        ) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
 

ALTER TABLE dbo.child1 ADD CONSTRAINT
        FK_child1_root FOREIGN KEY
        (
        root_id
        ) REFERENCES dbo.root
        (
        root_id
        ) ON UPDATE  NO ACTION
        ON DELETE  NO ACTION
       
CREATE TABLE dbo.child2
        (
        child2_id int NOT NULL,
        child1_id int NOT NULL,
        field1 int NULL,
        field2 int NULL,
        fieldx int NULL
        )  ON [PRIMARY]

ALTER TABLE dbo.child2 ADD CONSTRAINT
        PK_child2 PRIMARY KEY CLUSTERED
        (
        child2_id
        ) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
 

ALTER TABLE dbo.child2 ADD CONSTRAINT
        FK_child2_child1 FOREIGN KEY
        (
        child1_id
        ) REFERENCES dbo.child1
        (
        child1_id
        ) ON UPDATE  NO ACTION
        ON DELETE  NO ACTION
       
CREATE TABLE dbo.child3
        (
        child3_id int NOT NULL,
        child2_id int NOT NULL,
        field1 int NULL,
        field2 int NULL,
        fieldx int NULL
        )  ON [PRIMARY]

ALTER TABLE dbo.child3 ADD CONSTRAINT
        PK_child3 PRIMARY KEY CLUSTERED
        (
        child3_id
        ) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
 

ALTER TABLE dbo.child3 ADD CONSTRAINT
        FK_child3_child2 FOREIGN KEY
        (
        child2_id
        ) REFERENCES dbo.child2
        (
        child2_id
        ) ON UPDATE  NO ACTION
        ON DELETE  NO ACTION
       
CREATE TABLE dbo.child4
        (
        child4_id int NOT NULL,
        child3_id int NOT NULL,
        field1 int NULL,
        field2 int NULL,
        fieldx int NULL
        )  ON [PRIMARY]

ALTER TABLE dbo.child4 ADD CONSTRAINT
        PK_child4 PRIMARY KEY CLUSTERED
        (
        child4_id
        ) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
 

ALTER TABLE dbo.child4 ADD CONSTRAINT
        FK_child4_child3 FOREIGN KEY
        (
        child3_id
        ) REFERENCES dbo.child3
        (
        child3_id
        ) ON UPDATE  NO ACTION
        ON DELETE  NO ACTION
       
CREATE TABLE dbo.child5
        (
        child5_id int NOT NULL,
        child4_id int NOT NULL,
        field1 int NULL,
        field2 int NULL,
        fieldx int NULL
        )  ON [PRIMARY]

ALTER TABLE dbo.child5 ADD CONSTRAINT
        PK_child5 PRIMARY KEY CLUSTERED
        (
        child5_id
        ) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
 

ALTER TABLE dbo.child5 ADD CONSTRAINT
        FK_child5_child4 FOREIGN KEY
        (
        child4_id
        ) REFERENCES dbo.child4
        (
        child4_id
        ) ON UPDATE  NO ACTION
        ON DELETE  NO ACTION

insert into root select 1,2021, null, null, null  
insert into child1 select 1,1, null, null, null  
insert into child2 select 1,1, null, null, null  
insert into child3 select 1,1, null, null, null  
insert into child4 select 1,1, null, null, null
insert into child5 select 1,1, 10, 20, 30  
insert into child5 select 2,1, 100, 200, 300

select r.year, c5.field1, c5.field2, c5.fieldx
from root r
inner join child1 c1 on r.root_id = c1.root_id
inner join child2 c2 on c1.child1_id = c2.child1_id
inner join child3 c3 on c2.child2_id = c3.child2_id
inner join child4 c4 on c3.child3_id = c4.child3_id
inner join child5 c5 on c4.child4_id = c5.child4_id
where c5.child5_id = 2
GO
year | field1 | field2 | fieldx
---: | -----: | -----: | -----:
2021 |    100 |    200 |    300

db<>fiddle here

Best Answer

From a very basic overview, I don't think your existing design is bad. In general, it's good to start out following a somewhat normalized pattern with your table design. Usually, I'll try to keep my tables thin enough to qualify the basic object they represent, usually grouping highly related fields within the same tables. If there are less used, less important, or more optional fields that describe that entity, then I'll generally refactor them to an EntityExtended table of sorts.

In your case, it sounds like you have multiple tables with unique referential constraints across each of them, which is quite normal. Without knowing the actual content of your tables and/or context, it's hard to advise if they can be restructured better. But in general, from a performance standpoint, there shouldn't be anything wrong with your design unless you're always querying from the root table down to child5. Then following a more denormalized type of pattern may make more sense, than always recreating the same multiple joins.

There's always a trade-off in both directions for the performance of reducing row sizes (to minimize the number of data pages) to the number of joins usually needed to recreate the datasets most frequently queried for. So it'll depend on your use cases and most common queries, but you can test this and see what works best for your needs. If you care to qualify your post with more specific details to contextualize your schema and type of data, we might be able to provide more specific advice.

The only other thing I can mention is regardless of performance, you may find creating views to improve the usability and structure of your database so you don't have to repeat joins useful and can reduce your codebase. Views don't typically affect performance one way or the other, but help from a maintainability perspective.