Database Design – Deciding Between Complete Disjoint or Incomplete Overlapping in Supertype/Subtype

database-designsubtypes

I'm building an inventory database that stores IT hardware, such as desktop computers, laptops, switches, routers, mobile phones, etc. I'm using supertype/subtype pattern, where all devices are stored in a single table, and specific information is put into subtype tables. My dilemma is choosing between the following two designs:

enter image description here

In the top diagram all devices share common subtypes. For example, desktop computers and laptops would have records in the following tables: Device, NetworkDevice. A switch would have records in: Device, NetworkDevice. A router would have records in: Device, NetworkDevice, WANDevice. Any device for which we track location will have a record in Location. Some pros and cons that I thought of for this setup:

Pro: SELECTing records based on a common field, like Hostname, or LocationID is easier.
Pro: No null fields.
Con: Tables that should be included in CRUD operations for a particular device are not obvious, and may confuse future DBAs.

In the bottom diagram all devices have their own subtype (There are more classes of device that are not shown here). In this situation, it is obvious which tables records get inserted to or selected from. Desktop computers and laptops go in Computer, etc. Some pros and cons that I thought of for this setup:

Pro: It is immediately obvious which tables to use for CRUD operations for subtypes.
Pro: Only have to use one table for CRUD operations.
Con: SELECTing records based on common subtype fields requires all tables to be combined, for example searching by Hostname, or LocationID.

In both situations, the ClassDiscriminator field is placed in subtype tables for use with a CHECK constraint to control which types can be inserted.

Are there any recommendations for which design is better, or is it completely a matter of opinion and dependent on the intended purpose of the database?

EDIT: A specific question I have regards the overlapping nature of the table "NetworkDevice". This table is meant to hold network information for any device with a hostname and/or IP address, whether it is a computer, switch, or router. Is the overlapping nature of this table something that could cause problems, or is it okay to implement it this way?

Thank you in advance for any input provided. Please ask if any additional information is needed.

Best Answer

Physical implementation of subtyping in a database is a complex issue. Unless you have a situation where it offers compelling advantages (see below for one or two examples) it adds complexity into implementation while providing relatively little value.

Having done this with really complex subtyping (applicaitons and sentences on a court case management system, disparate combined-risk commercial insurance contract structures) I guess I have some observations on this. Some significant corner cases are:

If the total number of database fields across the subtypes is relatively low (say: less than 100) or there is significant commonality between subtypes then splitting the subtypes out into separate physical tables is probably of little value. It will add significant overhead to reporting queries and searches. In most cases it's best to have a single table and manage your subtyping within the application. (Probably the closest to your problem)
If your subtyping is very disjoint, and different subtypes have type-dependent data structures hanging off them (i.e. child tables or more complex structures), then subtype tables make sense. In this case, each subtype probably has relatively little commonality within the application (i.e. there is probably a whole subsystem within the application dedicated to that subtype). Most reporting and querying will probably occur within a given sub-type, with cross-type queries mainly being restricted to a handful of common fields. (Court case management system)
If you have a large number of subtypes with disparate attributes and/or a requirement to make this configurable then a generic structure and supplementary metadata may be more appropriate. See this SO posting for a rundown on some possible approaches. (Insurance policy administration system)
If you have a very large number of fields with little commonality across your sub-types and little requirement to query across sub-type tables (i.e. nothing much in the way of multi-way outer joins against your sub-type tables) then sub-type tables may help to manage the column sprawl. (Pathologically complex version of your problem)
Some O/R mappers may only support a particular approach to managing sub-classes.

In most cases physical sub-type tables in a DB schema are a bit of a solution in search of a problem, as they potentially have undesirable side-effects.

In your case, I assume you have a relatively modest number of sub-types and a manageable number of attributes. Your diagram and question don't indicate any intention to hang child tables off the records. I would suggest that you consider going with the first option suggested above and maintaining one table and manage the sub-typing within your application.

Related Solutions

Database Design – Structuring Inventory Database with Varying Attributes

Supertype/Subtype

How about looking into the supertype/subtype pattern? Common columns go in a parent table. Each distinct type has its own table with the ID of the parent as its own PK and it contains unique columns not common to all subtypes. You can include a type column in both parent and children tables to ensure each device can't be more than one subtype. Make an FK between the children and the parent on (ItemID, ItemTypeID). You can use FKs to either the supertype or subtype tables to maintain the desired integrity elsewhere. For example, if the ItemID of any type is allowed, create the FK to the parent table. If only SubItemType1 can be referenced, create the FK to that table. I would leave the TypeID out of referencing tables.

Naming

When it comes to naming, you have two choices as I see it (since the third choice of just "ID" is in my mind a strong anti-pattern). Either call the subtype key ItemID like it is in the parent table, or call it the subtype name such as DoohickeyID. After some thought and some experience with this, I advocate calling it DoohickeyID. The reason for this is that even though there could be confusion about the subtype table really in disguise containing Items (rather than Doohickeys), that is a small negative compared to when you create an FK to the Doohickey table and the column names don't match!

To EAV or not to EAV - My experience with an EAV database

If EAV is what you truly have to do, then it's what you have to do. But what if it weren't what you had to do?

I built an EAV database that is in use in a business. Thank God, the set of data is small (though there are dozens of item types) so the performance is not bad. But it would be bad if the database had more than a few thousand items in it! Additionally, the tables are so HARD to query. This experience has led me to really desire to avoid EAV databases in the future if at all possible.

Now, in my database I created a stored procedure that automatically builds PIVOTed views for each and every subtype that exists. I can just query from AutoDoohickey. My metadata about the subtypes has a "ShortName" column containing an object-safe name suitable for use in view names. I even made the views updateable! Unfortunately, you cannot update them on a join, but you CAN insert to them an already-existing row, which will be converted to an UPDATE. Unfortunately, you cannot update only a few columns, because there is no way to indicate to the VIEW which columns you want to update with the INSERT-to-UPDATE conversion process: a NULL value looks like "update this column to NULL" even if you wanted to indicate "Don't update this column at all."

Despite all this decoration to make the EAV database easier to use, I still don't use these views in most normal querying because it is SLOW. Query conditions are not predicate pushed all the way back to the Value table, so it has to build an intermediate result set of all the items of that view's type before filtering. Ouch. So I have many, many queries with many, many joins, each one going out to get a different value and so on. They perform relatively well, but ouch! Here's an example. The SP that creates this (and its update trigger) is one giant beast, and I'm proud of it, but it is not something you want to ever try to maintain.

CREATE VIEW [dbo].[AutoModule]
AS
--This view is automatically generated by the stored procedure AutoViewCreate
SELECT
   ElementID,
   ElementTypeID,
   Convert(nvarchar(160), [3]) [FullName],
   Convert(nvarchar(1024), [435]) [Descr],
   Convert(nvarchar(255), [439]) [Comment],
   Convert(bit, [438]) [MissionCritical],
   Convert(int, [464]) [SupportGroup],
   Convert(int, [461]) [SupportHours],
   Convert(nvarchar(40), [4]) [Ver],
   Convert(bit, [28744]) [UsesJava],
   Convert(nvarchar(256), [28745]) [JavaVersions],
   Convert(bit, [28746]) [UsesIE],
   Convert(nvarchar(256), [28747]) [IEVersions],
   Convert(bit, [28748]) [UsesAcrobat],
   Convert(nvarchar(256), [28749]) [AcrobatVersions],
   Convert(bit, [28794]) [UsesDotNet],
   Convert(nvarchar(256), [28795]) [DotNetVersions],
   Convert(bit, [512]) [WebApplication],
   Convert(nvarchar(10), [433]) [IFAbbrev],
   Convert(int, [437]) [DataID],
   Convert(nvarchar(1000), [463]) [Notes],
   Convert(nvarchar(512), [523]) [DataDescription],
   Convert(nvarchar(256), [27991]) [SpecialNote],
   Convert(bit, [28932]) [Inactive],
   Convert(int, [29992]) [PatchTestedBy]
FROM (
   SELECT
      E.ElementID + 0 ElementID,
      E.ElementTypeID,
      V.AttrID,
      V.Value
   FROM
      dbo.Element E
      LEFT JOIN dbo.Value V ON E.ElementID = V.ElementID
   WHERE
      EXISTS (
         SELECT *
         FROM dbo.LayoutUsage L
         WHERE
            E.ElementTypeID = L.ElementTypeID
            AND L.AttrLayoutID = 7
      )
) X
PIVOT (
   Max(Value)
   FOR AttrID IN ([3], [435], [439], [438], [464], [461], [4], [28744], [28745], [28746], [28747], [28748], [28749], [28794], [28795], [512], [433], [437], [463], [523], [27991], [28932], [29992])
) P;

Here's another type of automatically-generated view created by another stored procedure from special metadata to help find relationships between items that can have multiple paths between them (Specifically: Module->Server, Module->Cluster->Server, Module->DBMS->Server, Module->DBMS->Cluster->Server):

CREATE VIEW [dbo].[Link_Module_Server]
AS
-- This view is automatically generated by the stored procedure LinkViewCreate
SELECT
   ModuleID = A.ElementID,
   ServerID = B.ElementID
FROM
   Element A
   INNER JOIN Element B
      ON EXISTS (
         SELECT *
         FROM
            dbo.Element R1
         WHERE
            A.ElementID = R1.ElementID1
            AND B.ElementID = R1.ElementID2
            AND R1.ElementTypeID = 38
      ) OR EXISTS (
         SELECT *
         FROM
            dbo.Element R1
            INNER JOIN dbo.Element R2 ON R1.ElementID2 = R2.ElementID1
         WHERE
            A.ElementID = R1.ElementID1
            AND R1.ElementTypeID = 40
            AND B.ElementID = R2.ElementID2
            AND R2.ElementTypeID = 38
      ) OR EXISTS (
         SELECT *
         FROM
            dbo.Element R1
            INNER JOIN dbo.Element R2 ON R1.ElementID2 = R2.ElementID1
         WHERE
            A.ElementID = R1.ElementID1
            AND R1.ElementTypeID = 38
            AND B.ElementID = R2.ElementID2
            AND R2.ElementTypeID = 3122
      ) OR EXISTS (
         SELECT *
         FROM
            dbo.Element R1
            INNER JOIN dbo.Element R2 ON R1.ElementID2 = R2.ElementID1
            INNER JOIN dbo.Element C2 ON R2.ElementID2 = C2.ElementID
            INNER JOIN dbo.Element R3 ON R2.ElementID2 = R3.ElementID1
         WHERE
            A.ElementID = R1.ElementID1
            AND R1.ElementTypeID = 40
            AND C2.ElementTypeID = 3080
            AND R2.ElementTypeID = 38
            AND B.ElementID = R3.ElementID2
            AND R3.ElementTypeID = 3122
      )
WHERE
   A.ElementTypeID = 9
   AND B.ElementTypeID = 17

The Hybrid Approach

If you MUST have some of the dynamic aspects of an EAV database, you could consider creating the metadata as if you had such a database, but instead actually using the supertype/subtype design pattern. Yes, you would have to create new tables, and add and remove and modify columns. But with the proper pre-processing (like I did with my EAV database's Auto views) you could have real table-like objects to work with. Only, they wouldn't be as gnarly as mine and the query optimizer could predicate push down to base tables (read: perform well with them). There would just be a one join between the supertype table and the subtype table. Your application could be set to read the metadata to discover what it is supposed to do (or it can use the auto-generated views in some cases). This protects your application code from having to be touched extensively just to add or modify things.

Or, if you had a multi-level set of subtypes, just a few joins. By multi-level I mean when some subtypes share common columns, but not all, you could have a subtype table for those that is itself a supertype of a few other tables. For example, if you are storing information about Servers, Routers, and Printers, an intermediate subtype of "IP Device" could make sense.

I will give the caveat that I haven't yet made such a hybrid supertype/subtype EAV-metatable-decorated database like I'm suggesting here yet to try out in the real world. But the problems I've experienced with EAV are not small, and doing something is probably an absolute must if your database is going to be large and you want good performance without some crazy expensive gigantic hardware.

In my opinion, the time spent automating the use/creation/modification of real subtype tables would ultimately be best. Focusing on flexibility driven by data makes the EAV sound so attractive (and believe me I love how when someone asks me for a new attribute on an element type I can add it in about 18 seconds and they can immediately start entering data on the web site). But flexibility can be accomplished in more than one way! Pre-processing is another way to do it. It's such a powerful method that so few people use, giving the benefits of being totally data-driven but the performance of being hard-coded.

(Note: Yes those views really are formatted like that and the PIVOT ones really do have update triggers. :) If someone is really that interested in the awful painful details of the long and complicated UPDATE trigger, let me know and I'll post a sample for you.)

And One More Idea

Put all your data in one table. Give columns generic names and then reuse/abuse them for multiple purposes. Create views over these to give them sensible names. Add columns when a suitable-data-type unused column is not available, and update your views. Despite my length going on about subtype/supertype, this may be the best way.

Sql-server – Separate archive tables or soft delete for inventory database

I would say that if your users are going to need to query the Archive data, then using the bit flag or soft delete is easier. If the users don't need the data any longer, then I would go with the archive tables.

Based on your description above, I would suggest going with the Soft Delete version. I can tell you from experience in one of our systems, we went with an archive schema to move older data to and it lead to nothing but issues because the users needed access to the data. So it lead to using UNION ALL on every query we had to run.

As a result of the issues, we stopped that route and moved to the soft delete, which is much easier.

We added a bit flag to all of the tables it was needed and then we just included this in the WHERE clause when querying the data.

A suggestion would be to make sure that this field has a default value when you INSERT data. If you are using IsArchived then the default value on the column would be false since you do not want it archived immediately.

Best Answer

Related Solutions

Database Design – Structuring Inventory Database with Varying Attributes

Sql-server – Separate archive tables or soft delete for inventory database

Related Question