Relationships between subtypes of an entity

database-designoraclereferential-integritysubtypes

I have a very big (hundred of milions rows) table which represent an entity (event) having many subtypes (different kind of recorded events).
Each row is identified by an id (obj_id) and has an attribute obj_type that records the type of event.

All the subtypes share the most important attributes (represented on the main table) but a few of them (a minority) have aditional attributes that are represented in additional children tables.

I have to represent new relationships involving rows from the main table. For example record that an event of type 'A' is in relation with an event of type B and an event of type 'C' or 'D', or an event of type 'X' or 'Y' is in relation with an event of type 'Z'.

Which one of the two schemas below is better? (Is any of the two not advisable?)

Schema 1:

create table supertype(  
obj_id varchar2(30) primary key,  
obj_type varchar2(1) check (obj_type in ('A','B','C','D',..)),  
attribute1  varchar2(1),  
attribute2  varchar2(1),  
attribute3  varchar2(1),
...  
attributen  varchar2(1)
)  


create table relation_A_with_B_and_cd(  
obj_id_subtype_A varchar2(30) primary key  references supertype(obj_id),  
obj_id_reference_to_subtype_B varchar2(30) references supertype(obj_id),  
obj_id_reference_to_subtype_C_or_D varchar2(30) references supertype(obj_id),  
attribute_det_a1  varchar2(1),  
attribute_det_a2  varchar2(1),
)   

create table relation_XY_with_Z(  
obj_id_subtype_X_or_Y varchar2(30) primary key  references supertype(obj_id),  
obj_id_reference_to_subtype_Z varchar2(30) references supertype(obj_id),  
attribute_1 varchar2(1),  
attribute_2  varchar2(1),
)   

Schema 2:

create table supertype(  
obj_id varchar2(30) primary key,  
obj_type varchar2(1) check (obj_type in ('A','B','C','D',...)),  
attribute1  varchar2(1),  
attribute2  varchar2(1),  
attribute3  varchar2(1)  
)  
alter table supertype add constraint UQ_type unique(obj_id,obj_type)   

create table relation_A_with_B_and_cd(  
obj_id_subtype_A varchar2(30) primary key ,  
obj_type_subtype_A varchar2(1) check (obj_type in ('A')),  
obj_id_reference_to_subtype_B varchar2(30) ,  
obj_type_reference_to_subtype_B varchar2(1) check (obj_type_reference_to_subtype_B in ('B')),  
obj_id_reference_to_subtype_CD varchar2(30) ,  
obj_type reference_to_subtype_CD varchar2(1) check (obj_type reference_to_subtype_CD in ('C','D')),   
attribute_det_a1  varchar2(1),  
attribute_det_a2  varchar2(1),  
constraint fk_subtype_a foreign key (obj_id_subtype_A,obj_type_subtype_A) references supertype  (obj_id,obj_type),  
constraint fk_ref_subtype_B foreign key (obj_id_reference_to_subtype_B ,obj_type_reference_to_subtype_B) references supertype  (obj_id,obj_type),  
constraint fk_ref_subtype_CD foreign key (obj_id_reference_to_subtype_CD ,obj_type_reference_to_subtype_CD) references supertype  (obj_id,obj_type),  
)   

create table relation_XY_with_Z(  
obj_id_subtype_XY varchar2(30) primary key,  
obj_type_subtype_XY varchar2(1) check (obj_type in ('X','Y')),  
obj_id_reference_to_subtype_Z varchar2(30) ,  
obj_type_reference_to_subtype_Z varchar2(1)  check (obj_type reference_to_subtype_Z in ('Z')),    
attribute_1 varchar2(1),   
attribute_2  varchar2(1),  
constraint fk_subtype_XY foreign key (obj_id_subtype_XY,obj_type_subtype_XY) references supertype  (obj_id,obj_type),  
constraint fk_ref_subtype_Z foreign key (obj_id_reference_to_subtype_Z ,obj_type reference_to_subtype_X) references supertype  (obj_id,obj_type)  
) 

Considerations

  • The second one duplicates some information but the constraint ensures the object ids in the relations are of the correct subtype. It leaves me perplexed however to declare unique (obj_id,obj_type) since this is not a minimal key, because obj_id is already the primary key.

  • The first one doesn't duplicate data but doesn't perform any check on the correctness of event types participating in the relation.

Best Answer

When implementing subtypes of an entity there are various solutions on Oracle:

  1. Create one wide table with the attributes of all subentities. Entity type stored in a field and used to enforce business rules ("You may only fill column A for entity type B").
  2. Create one master table with the shared attributes and per subentity another table. And relations back to the master table.
  3. Create one table for each subtype and include the shared columns.

Here you are suggesting option 2. Option 2 keeps you master table rather narrow in terms of columns and might save some storage. However, it has serious drawback when you want to enforce business rules declaratively or through PL/SQL triggers or alike.

For a very large table and option 2 I would recommend against using declarative constraints, even when I am a lover of robust databases. It can get really slow, you need to appropiate indexes for locking and parallel loading high volumes can be challenging in terms of performance and parallelization.

For a very large table I would recommend either option 3 or option 1. Option 1 is rather simple to understand and given the large number of columns Oracle allows in one table, it shall not be a problem. Storage should not be an issue; trailing null columns are not stored at all in data blocks and null values within the row itself only require 1 byte per column to signal "hi, I have no value for this column". For option 1 you can write well-performing and parallellizable views that select only one or more of the subtypes. Option 1 also can be loaded fast in parallel. And you can implement many rules requiring a fk as a check constraint on a row, which does not involve locking or other problems hindering high volume use.

Unless... of course when one of the subtypes have comparetively very wide columns compared to the rest. That would make each of the data blocks contain considerably less rows. In that case I would recommend option 3, since they are such wide differences and use 'union all' constructs to run queries on the whole supertype. A mix can be option 3 and option 1: option 3 for the subtypes with optional columns with lots of data (varchar2(4000) filled nicely) and option 1 for the other subtypes.