Database Design – Heterogeneous Ordered List of Values

database-designnosqlrelational-theory

I have the following object model:

class A {
    List<Thing> orderedListOfThings;
}

interface Thing {        
}
class X implements Thing {
   int attr;
}
class Y implements Thing {
   String value;
}
class Z implements Thing {
   B b;
}
class B {
   ...
}

The order of the list orderedListOfThings must be preserved. This list can contains X, Y, Z instances. There isn't other Thing implementations.

  1. how this model could be design in a relational database ?

  2. Is there any No-SQL database (key-value, columnar, graph, triple store) that fits better for this kind of model ?

Best Answer

As others have mentioned, this can be done rather easily via a SubClass / Inheritance model. I would advocate something closer to what Pieter Geerkens suggested than what VDohnal suggested as I prefer not having lots of NULL fields per row, depending on the "type" that the row represents (i.e. the [Value] field has to be NULL on an X row as X does not have that property, Y does but does not have the [attr] field so [attr] has to be NULL on rows representing Y objects). I just feel that is harder to maintain over time as someone needs to be VERY familiar with the app layer object model to know if the NULL field is a valid data point for that row or to be ignored for that row. Even if you have CHECK constraints to enforce NULL, that is yet something else to maintain as you add properties to objects over time and it is still not easily visible when writing queries against these tables. Please note that the issue, in the past, would have also been a needless waste of space related to fixed-length data types at the data layer, but that has been mostly alleviated by the SPARSE option which was introduced in SQL Server 2008 (well, assuming you are using SQL Server).

Notes:

  • The essential aspect of this model is that the [Thing] table is the "base" table for your "Thing" object.
  • The [Thing] table has a [ThingType_ID] field which is not a functional requirement but is a data-dictionary / lookup-table so that you can associate meaningful information for people doing ad hoc queries or reports, etc. It also helps to determine in which subclass table to look for the extension properties, assuming there is a reason why you don't already know and are first looking in the [Thing] table for general info.
  • The [Thing] table has a [A_ID] field which is an FK back to the [A] table. This maps to your structure of having an A class that contains a collection of Thing.
  • The subclass tables of [ThingX], [ThingY], and [ThingZ] have the same PK as the base class table [Thing] (i.e. [Thing_ID]). The reason to not have separate IDENTITY values is that the subclass tables are not an entity by themselves but are merely an extension of the base [Thing]. Their identity IS the identity (the concept of identity, not the SQL keyword IDENTITY) of [Thing].
  • Going along with the prior point, the PK field of the subclass tables [ThingX], [ThingY], and [ThingZ] is also an FK back to the [Thing] table.
  • There is no need for a separate [B] table as those properties are all 1 to 1 with the definition of the [ThingZ] class / sub-table.
  • All 4 of the base-class and sub-class tables have a [ModifiedDate] audit field as it is possible to update the properties of either the base [Thing] table OR a subclass table but not both in the same operation (e.g. updating the [ThingX].[Attr] field doesn't require updating the [Thing] table).
  • Only the base class table--[Thing]--has the [CreatedDate] audit field because rows in the subclass tables cannot be created independently (because it is only the combination of the [Thing] table and one of the subclass tables that make up an object). I suppose it is possible to create a row in the [Thing] table only (i.e. no INSERT into any subclass tables) and later come back to INSERT into one of the subclass tables, but that is just silly ;-).

Example SQL using Microsoft T-SQL syntax (but no features highly-specific to SQL Server):

CREATE TABLE A
(
   A_ID  INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
   -- other fields?
   CreatedDate DATETIME NOT NULL DEFAULT (GETUTCDATE()),
   ModifiedDate DATETIME NULL
);

CREATE TABLE ThingType
(
   ThingType_ID TINYINT NOT NULL PRIMARY KEY,
   ThingType NVARCHAR(100) NOT NULL
);

CREATE TABLE Thing
(
   Thing_ID INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
   ThingType_ID TINYINT NOT NULL REFERENCES ThingType (ThingType_ID), -- FK to ThingType
   A_ID INT NOT NULL REFERENCES A (A_ID), -- FK to A
   -- other fields?
   CreatedDate DATETIME NOT NULL DEFAULT (GETUTCDATE()),
   ModifiedDate DATETIME NULL
);

CREATE TABLE ThingX
(
   Thing_ID INT NOT NULL PRIMARY KEY REFERENCES Thing (Thing_ID), -- FK to Thing
   Attr INT,
   -- other fields?
   ModifiedDate DATETIME NULL
);

CREATE TABLE ThingY
(
   Thing_ID INT NOT NULL PRIMARY KEY REFERENCES Thing (Thing_ID), -- FK to Thing
   Value NVARCHAR(500),
   -- other fields?
   ModifiedDate DATETIME NULL
);

CREATE TABLE ThingZ
(
   Thing_ID INT NOT NULL PRIMARY KEY REFERENCES Thing (Thing_ID), -- FK to Thing
   Something DATETIME,
   -- other fields?
   ModifiedDate DATETIME NULL
);

Side Notes:
Just to be clear about differences between this suggestion and Pieter's (since they are very similar):

  1. I have no Table Type for [B] as I don't see a need for it
  2. My interpretation of your structure (i.e. class A having List<Thing>) is such that both of the other answers have misconstrued that relationship:
    1. Pieter's does not allow for having singular (i.e. non-repeated) properties of A while at the same time having multiple instances of Thing.
    2. VDohnal's has the relationship backwards such that a single Thing will have multiple A s. (this has since been corrected).
  3. On a practical level, I cannot see a case for doing the LEFT JOIN query at the bottom of Pieter's answer, but maybe on occasion there is. In such a case I have opted instead to query each type individually via INNER JOIN and then combining via UNION ALL (fields in one table that aren't in the other subclass table(s) need to be given literal values):

     SELECT base.field, subclass.field, subclass.[Attr]
     FROM [Thing] base
     INNER JOIN [ThingX] subclass
             ON subclass.Thing_ID = base.Thing_ID
     UNION ALL  
     SELECT base.field, subclass.field, CONVERT(INT, NULL) AS [Attr]
     FROM [Thing] base
     INNER JOIN [ThingY] subclass
             ON subclass.Thing_ID = base.Thing_ID
    

EDIT:
Here are two more notes that I thought of:

  • One benefit of this broken out design, as opposed to the single table model, is that it allows for parts of the object model to change (as business needs change over time) without impacting the parts that aren't changing. For example, adding a new property to the X object does not impact either of the [ThingY] or [ThingZ] tables nor the queries that make use of [ThingY] and [ThingZ].
  • For simplicity of getting data that represents the full X, Y, or Z objects, you can create Views that are a simple CREATE VIEW [X] AS SELECT * FROM [Thing] base INNER JOIN [ThingX] sub ON base.Thing_ID = sub.Thing_ID and so on for the Y and Z objects.

EDIT 2:
Renamed the [X], [Y], and [Z] tables to be [ThingX], [ThingY], and [ThingZ] respectively. This naming convention for the subclass tables is more representative of them being extensions and not complete objects / entities by themselves. This should also make the model (and working with the tables) conceptually clearer in that [X] (the View suggested directly above) = [Thing] + [ThingX], and so on with [Y] and [Z].

Also, the tables will be grouped together in SSMS, queries against sys.tables / sys.objects / INFORMATION_SCHEMA.TABLES and hopefully in your source control system where you have individual SQL scripts for each table :-).