Schema design/data model for multi-level parent-child where each resource also has a one-to many with a shared resource type

database-design

Not sure if this is the right stack exchange for asking this question. If I should ask this elsewhere let me know.


Let's pretend that I have a system in which at the moment there are 3 resources that have a prent/child relationship, but it is anticipated the hierarchy might grow into 10 resources but not more:

SuperGroup
         \
          +-- Group
                  \
                   + SubGroup

For the sake of simplicity I have only kept the name field here. The parent child relationship is modelled using foreign-keys:

create table supergroup (
    id serial primary key, 
    name text
)

create table group (
    id serial primary key, 
    super_group_id int not null
    foreign key references super_group(id),
    name text
)

create table subgroup (
    id serial primary key, 
    group_id int not null
    foreign key references group(id),
    name text
)

For each group, we want to track monthly status of a property/asset they all share where the property has multiple fields.

Let's call this property specific_property and let's assume it has fields val1, val2, val3 and val4.

At the moment I have modelled this association like this:

create table specific_property (
    id   serial primary key,
    val1 text,
    val2 int,
    val3 float
    val4 boolean
)

create table supergroup_specific_properties(
    supergroup_id        int foreign key references supergroup(id) not null,
    specific_property_id int foreign key references specific_property(id) not null,
    primary key (supergroup_id, specific_property_id)
)

create table group_specific_properties(
    group_id             int foreign key references group(id) not null,
    specific_property_id int foreign key references specific_property(id) not null,
    primary key (group_id, specific_property_id)
)

create table subgroup_specific_properties(
    subgroup_id          int foreign key references supergroup(id) not null,
    specific_property_id int foreign key references specific_property(id) not null,
    primary key (subgroup_id, specific_property_id)
)

One use case that I have is returning all specific_properties for a super group and all of its descendants:

select * from supergroup super
inner join group group                            on (group.supergroup_id = super.id)
inner join subgroup sub                           on (sub.subgroup_id = group.id)

inner join supergroup_specific_properties supersp on (supersp.supergroup_id = super.id)
inner join group_specific_properties gsp          on (gsp.supergroup_id = group.id)
inner join subgroup_specific_properties subsp     on (subsp.supergroup_id = sub.id)

inner join specific_property sp1                  on (sp1.id = supersp.specific_property_id)
inner join specific_property sp2                  on (sp2.id = gsp.specific_property_id)
inner join specific_property sp3                  on (sp3.id = subsp.specific_property_id)

The nice thing about this model, if I am not missing anything here, is that it is normalized (3NF I suppose). The ugly thing here is that I have to create one join table per group type and search queries become join heavy and a bit complex.


An alternative model could have been:

create table specific_property (
    id          serial primary key,
    val1        text,
    val2        int,
    val3        float
    val4        boolean,
    group_type  enum('supergroup', 'group', 'subgroup'),
    group_id    int 
)

The nice thing about this is that it is very simple and easy to use. The problem is I don't have referential integrity for group_id.


A second alternative could look like this:

create table supergroup_specific_properties(
    id                   serial primary key,
    supergroup_id        int foreign key references supergroup(id) not null,
    val1                 text,
    val2                 int,
    val3                 float
    val4                 boolean
)

create table group_specific_properties(
    id                   serial primary key,
    supergroup_id        int foreign key references group(id) not null,
    val1                 text,
    val2                 int,
    val3                 float
    val4                 boolean
)

create table subgroup_specific_properties(
    id                   serial primary key,
    supergroup_id        int foreign key references subroup(id) not null,
    val1                 text,
    val2                 int,
    val3                 float
    val4                 boolean
)

but this one has a lot of duplication and more difficult to evolve and maintain.


Is it possible to objectively choose one of them (or design an alternative) that:

  • Is easy to use
  • Protects data integrity and avoids anomalies
  • Is performant for my given use case and does not become difficult for the planner to optimize when the hierarchy grows

Best Answer

Some food for thought:

create table type_of_group
( type_of_group_code char(3) not null primary key
);

insert into type_of_group (type_of_group_code)
values ('SUP'),('GRP'),('SUB');

create table hierarchy_group -- in lack of a better name
( group_id serial primary key
, type_of_group_code char(1) not null 
      references type_of_group (type_of_group_code)
, name_of_group text not null
, unique (type_of_group_code, group_id)
)

create table supergroup 
( group_id int not null primary key
, type_of_group_code char(1) DEFAULT 'SUP' not null
, check (type_of_group_code = 'SUP')
,    foreign key (type_of_group_code, group_id)
     references hierarchy_group (type_of_group_code, group_id)
);

create table group 
( -- similar as supergroup
...
) 

create table subgroup 
( -- similar as supergroup
...
) 

Depending on the relationship between properties and group you would need one or two tables (can a group have more than one property?).

create table specific_properties
( ...
      references hierarchy_group (group_id)

or

create table specific_properties
( ...

create table group_specific_properties
( ...
     references specific_properties (...)
  ...
     references hierarchy_group (group_id)

On the other hand, if different types of groups only differ in hierarchical position you can cope with just one table and a level attribute:

create table hierarchy_group -- in lack of a better name
( group_id serial primary key
, parent_id int not null
, group_level int not null
, parent_group_level int not null
, unique (group_id, group_level)
, foreign key (parent_id, parent_group_level)
      references hierarchy_group (...)

a set of constraints that guarantees correct hierarchy such as:

    (group_level = 0) => (group_id = parent_group_id)

which translates to:

    CHECK ( group_level > 0 OR (group_id = parent_group_id) )

and (translations to sql is left as an exercise for the reader :-)

    (group_level > 0) => (group_level = parent_group_level + 1)

    (group_level between 0 AND 2)