Design for hierarchical embedded database

database-designhierarchynormalizationoptimizationselect

The goal is to construct a hierarchical database of Objects having the following format:

TypeA -> ObjectA [values, contains ObjectB*, ObjectC*, etc.]
TypeB -> ObjectB [values, contains ObjectC*, ObjectD*, etc.]
TypeC -> ObjectC [values, ...]
TypeD -> ObjectD [values, ...]
TypeE -> ObjectE [values, ...]

edit: Each type can contain multiple objects of other types. eg. ObjectA can contain Multiple typeB, typeC. etc.

It is fixed and known what kind of values and objects each Type[A,B,C,…] may contain.
Select or searches can be on on all objects of a certain type, and the system would construct an object containing all the sub-objects referenced by that object.
Example: select all objects of TypeA -> returns :

ObjectA {
  values,
  ObjectB { values, ObjectC, ObjectD, etc. }
  ObjectC { values, ... }
  ...
 }

Doing this would mean a large number of nested select queries on the table. This being an embedded database, it is needed that the number of queries be minimum and there be no large joins, is there any way to partition the table or create indexes or a design principle that would help in optimizing the queries?

Best Answer

With the restrictions you have supplied and if you want to implement this is DBMS, I think you could use a supertype/subtype pattern for the 5 (or more) types of objects and only one additional table for the "object contains objects" list:

-- auxiliary table that has only 5 rows, one for each type
CREATE TABLE types
( level TINYINT NOT NULL PRIMARY KEY
, type_name VARCHAR(10) NOT NULL UNIQUE
) ;

INSERT INTO types (level, type_name)
VALUES (1,'A'), (2,'B'), (3,'C'), (4,'D'), (5,'E') ; 

-- this is the supertype
CREATE TABLE objects
( object_id INT NOT NULL UNIQUE
, level TINYINT NOT NULL REFERENCES types (type_id)
, UNIQUE (level, object_id)
) ;

-- and the subtypes
CREATE TABLE a
( object_id INT NOT NULL PRIMARY KEY 
, level TINYINT NOT NULL CHECK (level = 1)
--- various columns
, FOREIGN KEY (level, object_id) 
    REFERENCES objects (level, object_id)
) ;

--- and similarly for the other 4 tables
--- all referencing objects table

and finally the list table, which is simple many to many junction table with only an additional constraint to ensure that an object of higher level can contain only objects of lower levels:

CREATE TABLE contains
( container_level TINYINT NOT NULL
, container_id INT NOT NULL 
, item_level TINYINT NOT NULL
, item_id INT NOT NULL 
, PRIMARY KEY (container_id, item_id)
, FOREIGN KEY (container_level, container_id) 
    REFERENCES objects (level, object_id)
, FOREIGN KEY (item_level, item_id) 
    REFERENCES objects (level, object_id)
, CHECK (container_level > item_level)
) ;

SQLite has CHECK constraints and has recently implemented recursive CTEs, if you want to get the results of an object and all the objects underneath it in one query.

From a fast search, H2 also supports recursive CTEs but Derby does not.