Mysql – How to model an entity type that can have different sets of attributes

database-designMySQLsubtypes

I’m having some trouble in recreating a database with a one-to-many (1:M) relationship between Users and Items.

This is pretty straightforward, yes; however, each Item belongs to a certain Category (e.g., a Car, a Boat or a Plane), and each Category has a particular number of attributes, e.g.:

Car structure:

+----+--------------+--------------+
| PK | Attribute #1 | Attribute #2 |
+----+--------------+--------------+

Boat structure:

+----+--------------+--------------+--------------+
| PK | Attribute #1 | Attribute #2 | Attribute #3 |
+----+--------------+--------------+--------------+

Plane structure:

+----+--------------+--------------+--------------+--------------+
| PK | Attribute #1 | Attribute #2 | Attribute #3 | Attribute #4 |
+----+--------------+--------------+--------------+--------------+

Because of this diversity in the number of attributes (columns), I initially thought it would be a good idea to create one separate table for each Category, so I would avoid several NULLs and thus making a better use of the indexing.

Although it looked great at first, I couldn’t find a way to create the relationship between the Items and the Categories through the database because, at least in my modest experience as a database administrator, when creating Foreign Keys, I inform explicitly a database the table name and column.

In the end, I would like a solid structure to store all data, while having all the means to list all attributes of all Items a User may have with one query.

I could hardcode dynamic queries with the server-side language, but I feel this is wrong and not very optimal.

Additional information

These are my responses to MDCCL comments:

1. How many Item Categories of interest are there in your business context, three (i.e., Cars, Boats and Planes) or more?

In fact, it’s very simple: There are only five Categories in total.

2. Will the same Item always belong to the same User (that is, once a given Item has been “assigned” to a certain User it cannot be changed)?

No, they could change. In the fictional scenario of the question, it would be like User A sells Item #1 for User B, so the ownership must be reflected.

3. Are there attributes that are shared by some or all of the Categories?

Not shared but, from memory, I can tell that at least three attributes are present in all Categories.

4. Is there a chance that the cardinality of the relationship between User and Item is many-to-many (M:N) instead of one-to-many (1:M)? For example, in the case of following business rules: A User owns zero-one-or-many Items and An Item is owned by one-to-many Users

No, because Items would describe a physical object. Users will have a virtual copy of them, each one identified by a unique GUID v4

5. Regarding your following response to one of the question comments:

“In the fictional scenario of the question, it would be like User A sells Item #1 for User B, so the ownership must be reflected.”

It seems that you are planning to keep track of the item ownership evolution, so to speak. In this way, which attributes would you like to store about such phenomenon? Only the modification of the attribute that indicates the specific User who is the Owner of a specific Item?

No, not really. The ownership may change, but I don’t need to keep track of the previous Owner.

Best Answer

According to your description of the business environment under consideration, there exists a supertype-subtype structure that encompasses Item —the supertype— and each of its Categories, i.e., Car, Boat and Plane (along with two more that were not made known) —the subtypes—.

I will detail below the method I would employ to manage said scenario.

Business rules

In order to start delineating the relevant conceptual schema, some of the most important business rules determined so far (restricting the analysis to the three disclosed Categories only, to keep things as brief as possible) can be formulated as follows:

  • A User owns zero-one-or-many Items.
  • An Item is owned by exactly-one User at a specific instant.
  • An Item may be owned by one-to-many Users at distinct points in time.
  • An Item is classified by exactly-one Category.
  • An Item is, at all times,
    • either a Car
    • or a Boat
    • or a Plane.

Illustrative IDEF1X diagram

Figure 1 displays an IDEF1X1 diagram that I created to group the previous formulations along with other business rules that appear pertinent:

Figure 1 - Item and Categories Supertype-Subtype Structure

Supertype

On the one hand, Item, the supertype, presents the properties or attributes that are common to all the Categories, i.e.,

  • CategoryCode —specified as a FOREIGN KEY (FK) that references Category.CategoryCode and functions as a subtype discriminator, i.e., it indicates the exact Category of subtype with which a given Item must be connected—,
  • OwnerId —distinguished as a FK that points to User.UserId, but I assigned it a role name2 in order to reflect its special implications more accurately—,
  • Foo,
  • Bar,
  • Baz and
  • CreatedDateTime.

Subtypes

On the other hand, the properties that pertain to every particular Category, i.e.,

  • Qux and Corge;
  • Grault, Garply and Plugh;
  • Xyzzy, Thud, Wibble and Flob;

are shown in the corresponding subtype box.

Identifiers

Then, the Item.ItemId PRIMARY KEY (PK) has migrated3 to the subtypes with different role names, i.e.,

  • CarId,
  • BoatId and
  • PlaneId.

Mutually exclusive associations

As depicted, there is an association or relationship with a one-to-one (1:1) cardinality ratio between (a) each supertype occurrence and (b) its complementary subtype instance.

The exclusive subtype symbol portrays the fact that the subtypes are mutually exclusive, i.e., a concrete Item occurrence can be supplemented by a single subtype instance only: either one Car, or one Plane, or one Boat (never by zero or less, nor by two or more).

, I employed classic placeholder names to entitle some of the entity type properties, as their actual denominations were not supplied in the question.

Expository logical-level layout

Consequently, in order to discuss an expository logical design, I derived the following SQL-DDL statements based on the IDEF1X diagram displayed and described above:

-- You should determine which are the most fitting 
-- data types and sizes for all your table columns 
-- depending on your business context characteristics.

-- Also, you should make accurate tests to define the 
-- most convenient INDEX strategies based on the exact 
-- data manipulation tendencies of your business context.

-- As one would expect, you are free to utilize 
-- your preferred (or required) naming conventions. 

CREATE TABLE UserProfile (
    UserId          INT      NOT NULL,
    FirstName       CHAR(30) NOT NULL,
    LastName        CHAR(30) NOT NULL,
    BirthDate       DATE     NOT NULL,
    GenderCode      CHAR(3)  NOT NULL,
    Username        CHAR(20) NOT NULL,
    CreatedDateTime DATETIME NOT NULL,
    --
    CONSTRAINT UserProfile_PK  PRIMARY KEY (UserId),
    CONSTRAINT UserProfile_AK1 UNIQUE ( -- Composite ALTERNATE KEY.
        FirstName,
        LastName,
        GenderCode,
        BirthDate
    ),
    CONSTRAINT UserProfile_AK2 UNIQUE (Username) -- ALTERNATE KEY.
);

CREATE TABLE Category (
    CategoryCode     CHAR(1)  NOT NULL, -- Meant to contain meaningful, short and stable values, e.g.; 'C' for 'Car'; 'B' for 'Boat'; 'P' for 'Plane'.
    Name             CHAR(30) NOT NULL,
    --
    CONSTRAINT Category_PK PRIMARY KEY (CategoryCode),
    CONSTRAINT Category_AK UNIQUE      (Name) -- ALTERNATE KEY.
);

CREATE TABLE Item ( -- Stands for the supertype.
    ItemId           INT      NOT NULL,
    OwnerId          INT      NOT NULL,
    CategoryCode     CHAR(1)  NOT NULL, -- Denotes the subtype discriminator.
    Foo              CHAR(30) NOT NULL,
    Bar              CHAR(40) NOT NULL,
    Baz              CHAR(55) NOT NULL,  
    CreatedDateTime  DATETIME NOT NULL,
    --
    CONSTRAINT Item_PK             PRIMARY KEY (ItemId),
    CONSTRAINT Item_to_Category_FK FOREIGN KEY (CategoryCode)
        REFERENCES Category    (CategoryCode),
    CONSTRAINT Item_to_User_FK     FOREIGN KEY (OwnerId)
        REFERENCES UserProfile (UserId)  
);

CREATE TABLE Car ( -- Represents one of the subtypes.
    CarId INT          NOT NULL, -- Must be constrained as (a) the PRIMARY KEY and (b) a FOREIGN KEY.
    Qux   DATE         NOT NULL,
    Corge DECIMAL(5,2) NOT NULL,   
    --
    CONSTRAINT Car_PK         PRIMARY KEY (CarId),
    CONSTRAINT Car_to_Item_FK FOREIGN KEY (CarId)
        REFERENCES Item (ItemId),
    CONSTRAINT ValidQux_CK    CHECK       (Qux >= '1990-01-01')   
);

CREATE TABLE Boat ( -- Stands for one of the subtypes.
    BoatId INT      NOT NULL, -- Must be constrained as (a) the PRIMARY KEY and (b) a FOREIGN KEY.
    Grault SMALLINT NOT NULL,
    Garply DATETIME NOT NULL,   
    Plugh  CHAR(63) NOT NULL, 
    --
    CONSTRAINT Boat_PK         PRIMARY KEY (BoatId),
    CONSTRAINT Boat_to_Item_FK FOREIGN KEY (BoatId)
        REFERENCES Item (ItemId),
    CONSTRAINT ValidGrault_CK  CHECK       (Grault <= 10000)  
);

CREATE TABLE Plane ( -- Denotes one of the subtypes.
    PlaneId INT      NOT NULL, -- Must be constrained as (a) the PRIMARY KEY and (b) a FOREIGN KEY.
    Xyzzy   BIGINT   NOT NULL,
    Thud    TEXT     NOT NULL,  
    Wibble  CHAR(20) NOT NULL, 
    Flob    BIT(1)   NOT NULL,   
    --
    CONSTRAINT Plane_PK         PRIMARY KEY (PlaneId),
    CONSTRAINT Plane_to_Item_PK FOREIGN KEY (PlaneId)
        REFERENCES Item (ItemId),
    CONSTRAINT ValidXyzzy_CK    CHECK       (Xyzzy <= 3258594758)
);

This has been tested in this db<>fiddle running on MySQL 8.0.

As demonstrated, the superentity type and each of the subentity types are represented by the corresponding base table.

The columns CarId, BoatId and PlaneId, constrained as the PKs of the appropriate tables, help in representing the conceptual-level one-to-one association by way of FK constraints§ that point to the ItemId column, which is constrained as the PK of the Item table. This signifies that, in an actual “pair”, both the supertype and the subtype rows are identified by the same PK value; thus, it is more than opportune to mention that

  • (a) attaching an extra column to hold system-controlled surrogate values to (b) the tables standing for the subtypes is (c) entirely superfluous.

§ In order to prevent problems and errors concerning (particularly FOREIGN) KEY constraint definitions —situation you referred to in comments—, it is very important to take into account the existence-dependency that takes place among the different tables at hand, as exemplified in the declaration order of the tables in the expository DDL structure, which I supplied in this db<>fiddle too.

E.g., appending an additional column with the AUTO_INCREMENT property to a table of a database built on MySQL.

Integrity and consistency considerations

It is critical to point out that, in your business environment, you have to (1) ensure that each “supertype” row is at all times complemented by its corresponding “subtype” counterpart, and, in turn, (2) guarantee that said “subtype” row is compatible with the value contained in the “discriminator” column of the “supertype” row.

It would be very elegant to enforce such circumstances in a declarative manner but, unfortunately, none of the major SQL platforms has provided the proper mechanisms to do so, as far as I know. Therefore, resorting to procedural code within ACID TRANSACTIONS it is quite convenient so that these conditions are always met in your database. Other option would be employing TRIGGERS, but they tend to make things untidy, so to speak.

Declaring useful views

Having a logical design like the one explained above, it would be very practical to create one or more views, i.e., derived tables that comprise columns that belong to two or more of the relevant base tables. In this way, you can, e.g., SELECT directly FROM those views without having to write all the JOINs every time you have to retrieve “combined” information.

Sample data

In this respect, let us say that the base tables are “populated” with the sample data shown below:

--
INSERT INTO UserProfile 
    (UserId, FirstName, LastName, BirthDate, GenderCode, Username, CreatedDateTime)
VALUES
    (1, 'Edgar', 'Codd', '1923-08-19', 'M', 'ted.codd', CURDATE()),
    (2, 'Michelangelo', 'Buonarroti', '1475-03-06', 'M', 'michelangelo', CURDATE()),
    (3, 'Diego', 'Velázquez', '1599-06-06', 'M', 'd.velazquez', CURDATE());

INSERT INTO Category 
    (CategoryCode, Name)
VALUES
    ('C', 'Car'), ('B', 'Boat'), ('P', 'Plane');

-- 1. ‘Full’ Car INSERTion

-- 1.1 
INSERT INTO Item
    (ItemId, OwnerId, CategoryCode, Foo, Bar, Baz, CreatedDateTime)
VALUES
    (1, 1, 'C', 'Motorway', 'Tire', 'Chauffeur', CURDATE());
 
 -- 1.2
INSERT INTO Car
    (CarId, Qux, Corge)
VALUES
    (1, '1999-06-11',  999.99);

-- 2. ‘Full’ Boat INSERTion

-- 2.1
INSERT INTO Item
    (ItemId, OwnerId, CategoryCode, Foo, Bar, Baz, CreatedDateTime)
VALUES
    (2, 2, 'B', 'Ocean', 'Anchor', 'Sailor', CURDATE());

-- 2.2
INSERT INTO Boat
    (BoatId, Grault, Garply, Plugh)
VALUES
    (2, 10000, '2016-03-09 07:32:04.000', 'So far so good.');

-- 3 ‘Full’ Plane INSERTion

-- 3.1   
INSERT INTO Item
    (ItemId, OwnerId, CategoryCode, Foo, Bar, Baz, CreatedDateTime)
VALUES
    (3, 3, 'P', 'Sky', 'Wing', 'Aviator', CURDATE());

-- 3.2

INSERT INTO Plane
    (PlaneId, Xyzzy, Thud, Wibble, Flob)
VALUES
    (3, 3258594758, 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut sollicitudin pharetra sem id elementum. Sed tempor hendrerit orci. Ut scelerisque pretium diam, eu sodales ante sagittis ut. Phasellus id nunc commodo, sagittis urna vitae, auctor ex. Duis elit tellus, pharetra sed ipsum sit amet, bibendum dapibus mauris. Morbi condimentum laoreet justo, quis auctor leo rutrum eu. Sed id nibh non leo sodales pulvinar. Nam ornare ipsum nunc, eget molestie nulla ultrices vel. Curabitur fermentum nisl quis lorem aliquam pretium aliquam at mauris. In vestibulum, tellus et pharetra sollicitudin, mi lacus consectetur dolor, id volutpat nulla eros a mauris. ', 'Here we go!', TRUE);

--

Then, an advantageous view is one that gathers columns from Item, Car and UserProfile:

--

CREATE VIEW CarAndOwner AS
    SELECT C.CarId,
           I.Foo,
           I.Bar,
           I.Baz,
           C.Qux,
           C.Corge,           
           U.FirstName AS OwnerFirstName,
           U.LastName  AS OwnerLastName
        FROM Item I
        JOIN Car C
          ON C.CarId = I.ItemId
        JOIN UserProfile U
          ON U.UserId = I.OwnerId;

--

Naturally, a similar approach can be followed so that you can as well SELECT the “full” Boat and Plane information straight FROM one single table (a derived one, in these cases).

After that —if you do not mind about the presence of NULL marks in result sets— with the following VIEW definition, you can, e.g., “collect” columns from the tables Item, Car, Boat, Plane and UserProfile:

--

CREATE VIEW FullItemAndOwner AS
    SELECT I.ItemId,
           I.Foo, -- Common to all Categories.
           I.Bar, -- Common to all Categories.
           I.Baz, -- Common to all Categories.
          IC.Name      AS Category,
           C.Qux,    -- Applies to Cars only.
           C.Corge,  -- Applies to Cars only.
           --
           B.Grault, -- Applies to Boats only.
           B.Garply, -- Applies to Boats only.
           B.Plugh,  -- Applies to Boats only.
           --
           P.Xyzzy,  -- Applies to Planes only.
           P.Thud,   -- Applies to Planes only.
           P.Wibble, -- Applies to Planes only.
           P.Flob,   -- Applies to Planes only.
           U.FirstName AS OwnerFirstName,
           U.LastName  AS OwnerLastName
        FROM Item I
        JOIN Category IC
          ON I.CategoryCode = IC.CategoryCode
   LEFT JOIN Car C
          ON C.CarId = I.ItemId
   LEFT JOIN Boat B
          ON B.BoatId = I.ItemId
   LEFT JOIN Plane P
          ON P.PlaneId = I.ItemId               
        JOIN UserProfile U
          ON U.UserId = I.OwnerId;

--

The code of the views here shown is only illustrative. Of course, doing some testing exercises and modifications might help to accelerate the (physical) execution of the queries at hand. In addition, you might need to remove or add columns to said views as the business needs dictate.

The sample data and all the view definitions are incorporated into this db<>fiddle so that they can be observed “in action”.

Data manipulation: Application program(s) code and column aliases

The usage of application program(s) code (if that is what you mean by “server-side specific code”) and column aliases are other significant points that you brought up in the next comments:

  • I did manage to workaround [a JOIN] issue with server-side specific code, but I really don't want to do that -AND- adding aliases to all columns might be "stressing".
  • Very well explained, thank you very much. However, as I suspected, I'll have to manipulate the result set when listing all data because of the similarities with some columns, since I don't want to use several aliases to keep the statement cleaner.

It is opportune to indicate that while using application program code is a very fitting resource to handle the presentation or graphical features —i.e., the external level of representation of a computerized information system— of data sets, it is paramount that you avoid carrying out data retrieval on a row-by-row basis to prevent execution speed issues. The objective should be to “fetch” the pertinent data sets in toto by means of the robust data manipulation instruments provided by the (precisely) set engine of the SQL platform so that you can optimize the behaviour of your system.

Furthermore, utilizing aliases to rename one or more columns within a certain scope may appear stressing but, personally, I see such resource as a very powerful tool that helps to (i) contextualize and (ii) disambiguate the meaning and intention ascribed to the concerning columns; hence, this is an aspect that should be thoroughly pondered with respect to the manipulation of the data of interest.

Similar scenarios

You might as well find of help this series of posts and this group of posts which contain my take on two other cases that include supertype-subtype associations with mutually exclusive subtypes.

I have also proposed a solution for a business environment involving a supertype-subtype cluster where the subtypes are not mutually exclusive in this (newer) answer.


Endnotes

1 Integration Definition for Information Modeling (IDEF1X) is a highly recommendable data modeling technique that was established as a standard in December 1993 by the U.S. National Institute of Standards and Technology (NIST). It is solidly based on (a) some of the theoretical works authored by the sole originator of the relational model, i.e., Dr. E. F. Codd; on (b) the entity-relationship view, developed by Dr. P. P. Chen; and also on (c) the Logical Database Design Technique, created by Robert G. Brown.

2 In IDEF1X, a role name is a distinctive label assigned to a FK property (or attribute) in order to express the meaning that it holds within the scope of its respective entity type.

3 The IDEF1X standard defines key migration as “The modeling process of placing the primary key of a parent or generic entity in its child or category entity as a foreign key”.