Mysql – Modelling a database structure for multiple user types and their contact information

database-designMySQL

I'm designing a database that will store users of varying types. Predominantly (but not exclusively) they will be Actors, Directors and Writers. Currently there are just four user types that are of relevance. There is a outside chance that this number may increase, but the probability is low – and in such a case would be by a very small number.

The plan is to have a users table responsible pretty much solely for logging into the site (name, email and password columns plus one or two others such as whether they've been approved, and updated_at), and additional tables for each of the respective user types that each have their own unique set of columns. Only actors, for example, will have an ethnicity column, only Directors would have a bio column, and only Writers would need to provide their location. However, as I've not managed a database of this complexity before, I'm wondering how to organise a couple of aspects:

Firstly, users can be any one, or any combination, of the above types. So I understand I would need something like (for example) a director_user table with director_id and user_id columns. Would this then be sufficient to be able to filter all users by role type and so on?

Secondly, most users will the option of a twitter profile and phone number. And all actors will have to include at least one URL for any of their other online actor profiles; currently there are three that they can include, but this number may increase. Am I right in assuming that a separate table for each of the possible profiles/contact methods is an optimal way to organise data?

Best Answer

According to my interpretation of your description of the relevant business context, you are dealing with a supertype-subtype1 structure where (a) Actor, Director and Writer are entity subtypes of (b) Person, their entity supertype, and (c) said subtypes are not mutually exclusive.

In this way, if you are interested in building a relational database that mirrors such a scenario accurately —and hence are expecting that it functions as such—, your following comment clarifications are quite significant with respect to the previous points, because they have implications at both (1) the conceptual and (2) the logical levels of representation of the database in question:

  • […] additional tables for each of the respective user types that each have their own unique set of columns.
  • […] there are just four user types that are of relevance. There is a outside chance that this number may increase, but the probability is low - and in such a case would be by a very small number.

I will elaborate on all those aspects and several other critical factors in the sections below.

Business rules

In order to first define the corresponding conceptual schema —which can be used as a subsequent reference so that you can adapt it to make certain that it meets the exact informational requirements—, I have formulated some business rules that are of particular importance:

  • A Person may carry out one-two-or-three (i.e., one-to-all) Roles2. In other words, a Person may be
    • an Actor and/or
    • a Director and/or
    • a Writer.
  • A Person may log in via zero-or-one UserProfile.
  • An Actor provides one-two-or-three URLs3.
  • An Actor is grouped by one Ethnicity.
  • An Ethnicity groups zero-one-or-many Actors.
  • A Writer is based in one Location.
  • A Location is the base of zero-one-or-more Writers.

Expository IDEF1X diagram

Then, I created the IDEF1X4 diagram shown in Figure 1, which groups all the formulations above along with other rules that appear pertinent:

Figure 1 - IDEF1X Diagram for Person Roles and Contact Details in Cinema

As demonstrated, the Person supertype (i) has its own box, (ii) possesses the properties or attributes that apply to all the subtypes, and (iii) presents lines that connect it with the boxes of every subtype.

In turn, every subtype (a) appears in its own dedicated box, and (b) holds only its applicable properties. The PRIMARY KEY of the supertype, PersonId, migrates5 to the subtypes with the role names6 ActorId, DirectorId, and WriterId respectively.

Also, I avoided coupling Person with the UserProfile entity type, which permits separating all their contextual implications, associations or relationships, etc. The PersonId property has migrated to UserProfile with the role name UserId.

You state in the question body that

And all actors will have to include at least one URL for any of their other online actor profiles; currently there are three that they can include, but this number may increase.

…so URL is an entity type in its own right, and is directly associated with the Actor subtype in accordance with this quote.

And, in comments, you specify that

[…] an actor will have a headshot (photo), while a writer will not […]

…then, among other features, I included Headshot as a property of the Actor entity type.

As for the Ethnicity and the Location entity types, they of course may entail more complex organizations (e.g., an Actor may belong to one, two or more different ethnic groups in distinct proportions, and a Writer may be based on a place that requires recording country, administrative region, county, etc.) but it looks like that the needs of your business context are covered successfully with the structures here modelled.

Naturally, you can make as many adjustments as necessary.

Illustrative SQL-DDL logical design

Consequently, based on the IDEF1X diagram shown and described above, I wrote the logical DDL layout that is shown as follows (I have supplied notes as comments that explain some of the characteristics that I esteem particularly important with respect to the tables, columns and constraints declared):

-- You should determine which are the most fitting 
-- data types and sizes for all your table columns 
-- depending on your business context characteristics.

-- Also, you should make accurate tests to define the 
-- most convenient INDEX strategies based on the exact 
-- data manipulation tendencies of your business needs.

-- As one would expect, you are free to utilize 
-- your preferred (or required) naming conventions. 

CREATE TABLE Person ( -- Represents the supertype.
    PersonId        INT      NOT NULL,
    FirstName       CHAR(30) NOT NULL,
    LastName        CHAR(30) NOT NULL,
    BirthDate       DATE     NOT NULL,
    GenderCode      CHAR(3)  NOT NULL,
    TwitterProfile  CHAR(30) NOT NULL,
    PhoneNumber     CHAR(30) NOT NULL,
    EmailAddress    CHAR(30) NOT NULL,  
    CreatedDateTime DATETIME NOT NULL,
    --
    CONSTRAINT Person_PK  PRIMARY KEY (PersonId),
    CONSTRAINT Person_AK1 UNIQUE ( -- Composite ALTERNATE KEY.
        FirstName,
        LastName,
        GenderCode,
        BirthDate
    ),
    CONSTRAINT Person_AK2 UNIQUE (TwitterProfile), -- ALTERNATE KEY.
    CONSTRAINT Person_AK3 UNIQUE (EmailAddress)    -- ALTERNATE KEY.
);

CREATE TABLE Ethnicity ( -- Its rows will serve a “look-up” purpose.
    EthnicityId     INT      NOT NULL,
    Name            CHAR(30) NOT NULL,  
    Description     CHAR(30) NOT NULL,
    CreatedDateTime DATETIME NOT NULL,
    -- 
    CONSTRAINT Ethnicity_PK PRIMARY KEY (EthnicityId),
    CONSTRAINT Ethnicity_AK UNIQUE      (Description)   
);

CREATE TABLE Actor ( -- Stands for one of the subtypes.
    ActorId         INT      NOT NULL, -- Must be constrained as (a) the PRIMARY KEY and (b) a FOREIGN KEY.
    Headshot        CHAR(30) NOT NULL, -- May, e.g., contain a URL indicating the path where the photo file is actually stored. 
    EthnicityId     INT      NOT NULL,
    CreatedDateTime DATETIME NOT NULL,
    -- 
    CONSTRAINT Actor_PK            PRIMARY KEY (ActorId),
    CONSTRAINT ActorToPerson_FK    FOREIGN KEY (ActorId)
        REFERENCES Person (PersonId),
    CONSTRAINT ActorToEthnicity_FK FOREIGN KEY (EthnicityId)
        REFERENCES Ethnicity (EthnicityId)   
);

CREATE TABLE Director ( -- Denotes one of the subtypes
    DirectorId      INT       NOT NULL, -- Must be constrained as (a) the PRIMARY KEY and (b) a FOREIGN KEY.
    Bio             CHAR(120) NOT NULL,  
    Etcetera        CHAR(30)  NOT NULL,
    CreatedDateTime DATETIME  NOT NULL,
    -- 
    CONSTRAINT Director_PK         PRIMARY KEY (DirectorId),
    CONSTRAINT DirectorToPerson_FK FOREIGN KEY (DirectorId)
        REFERENCES Person (PersonId)   
);

CREATE TABLE Country (
    CountryCode     CHAR(2)  NOT NULL,
    Name            CHAR(30) NOT NULL,  
    CreatedDateTime DATETIME NOT NULL,
    -- 
    CONSTRAINT Country_PK PRIMARY KEY (CountryCode),
    CONSTRAINT Country_AK UNIQUE      (Name)   
);

CREATE TABLE Location ( -- Its rows will serve a “look-up” purpose.
    CountryCode     CHAR(2)  NOT NULL,
    LocationCode    CHAR(3)  NOT NULL,
    Name            CHAR(30) NOT NULL,  
    CreatedDateTime DATETIME NOT NULL,
    -- 
    CONSTRAINT Location_PK PRIMARY KEY (CountryCode, LocationCode),
    CONSTRAINT Location_AK UNIQUE      (CountryCode, Name)   
);

CREATE TABLE Writer ( -- Represents one of the subtypes.
    WriterId        INT      NOT NULL, -- Must be constrained as (a) the PRIMARY KEY and (b) a FOREIGN KEY.
    CountryCode     CHAR(2)  NOT NULL,
    LocationCode    CHAR(3)  NOT NULL,
    CreatedDateTime DATETIME NOT NULL,
    -- 
    CONSTRAINT Writer_PK           PRIMARY KEY (WriterId),
    CONSTRAINT WriterToPerson_FK   FOREIGN KEY (WriterId)
        REFERENCES Person (PersonId),
    CONSTRAINT WriterToLocation_FK FOREIGN KEY (CountryCode, LocationCode)
        REFERENCES Location (CountryCode, LocationCode)  
);

CREATE TABLE UserProfile (
    UserId          INT      NOT NULL, -- Must be constrained as (a) the PRIMARY KEY and (b) a FOREIGN KEY.
    UserName        CHAR(30) NOT NULL,
    Etcetera        CHAR(30) NOT NULL,
    CreatedDateTime DATETIME NOT NULL,
    -- 
    CONSTRAINT UserProfile_PK         PRIMARY KEY (UserId),
    CONSTRAINT UserProfile_AK         UNIQUE      (UserName), -- ALTERNATE KEY.
    CONSTRAINT UserProfileToPerson_FK FOREIGN KEY (UserId)
        REFERENCES Person (PersonId)    
);

CREATE TABLE URL (
    ActorId       INT      NOT NULL,
    Address       CHAR(90) NOT NULL,
    Etcetera      CHAR(30) NOT NULL,
    AddedDateTime DATETIME NOT NULL,
    -- 
    CONSTRAINT URL_PK        PRIMARY KEY (ActorId, Address), -- Composite PRIMARY KEY.
    CONSTRAINT URLtoActor_FK FOREIGN KEY (ActorId)
        REFERENCES Actor (ActorId)
);

This has been tested in this db<>fiddle that runs on MySQL 8.0.

Therefore, (1) every singular aspect of the logical layout above carries a very precise meaning from (2) a singular feature of the business environment of interest7 —in agreement with the spirit of the relational framework by Dr. Edgar Frank Codd—, because:

  • Each base table represents an individual entity type.

  • Each column stands for a single property of the respective entity type.

  • A specific data type is fixed for each column in order to ensure that all the values it contains belong to a particular and properly-delimited set a , be it INT, DATETIME, CHAR, etc (and let us hope that MySQL will finally incorporate DOMAIN support in near-future version).

  • Multiple constraints are configured (declaratively) in order to guarantee that the assertions in form of rows retained in all the tables comply with the business rules determined at the conceptual level.

  • Each row is meant to convey well-defined semantics, e.g., a Person row is read:

    The Person identified by PersonId r is called by the FirstName s and the LastName t, was born on BirthDate u, has the GenderCode v, tweets on the TwitterProfile w, is reached through PhoneNumber x, is contacted via the EmailAddress y, and was registered on CreatedDateTime z.

Having a layout like this is decidedly favourable, as you can derive new tables (e.g., SELECT operations that gather columns FROM multiple tables with the help of the JOIN clause) that —in succession— carry a very precise meaning too (see the section entitled “Views” below).

It is woth to mention that, with this configuration, (i) a row representing a subtype instance is identified by (ii) the same PRIMARY KEY value that distinguishes the row denoting the complementary supertype occurrence. Thus, it is more than opportune to note that

  • (a) attaching an extra column to hold system-generated and system-assigned surrogates8 to (b) the tables standing for the subtypes is (c) entirely superfluous.

With this logical design, if new subtypes are defined as relevant in your business context, you would have to declare a new base table, but that happens as well when other kinds of entity types are deemed of significance, so said situation would be, in fact, ordinary.

Views

In order to “fetch”, e.g., all the information that corresponds to an Actor, Director or Writer, you can declare some views (i.e., derived or expressible tables) so that you can SELECT directly from one single resource without having to write the concerning JOINs every time; e.g., with the VIEW declared below, you can obtain the “full” Actor information:

--
CREATE VIEW FullActor AS

    SELECT P.FirstName,
           P.Lastname,
           P.BirthDate,
           P.GenderCode,
           P.TwitterProfile,
           P.PhoneNumber,
           P.EmailAddress,
           A.Headshot,
           E.Name AS Ethnicity
         FROM Person P
         JOIN Actor A
           ON A.ActorId     = P.PersonId
         JOIN Ethnicity E
           ON E.EthnicityId = A.EthnicityId;
--

Of course, you may follow a similar approach in order to retrieve the “full” Director and Writer information:

--
CREATE VIEW FullDirector AS

    SELECT P.FirstName,
           P.Lastname,
           P.BirthDate,
           P.GenderCode,
           P.TwitterProfile,
           P.PhoneNumber,
           P.EmailAddress,
           D.Bio,
           D.Etcetera
         FROM Person P
         JOIN Director D
           ON D.DirectorId = P.PersonId; 

--
CREATE VIEW FullWriter AS

    SELECT P.FirstName,
           P.Lastname,
           P.BirthDate,
           P.GenderCode,
           P.TwitterProfile,
           P.PhoneNumber,
           P.EmailAddress,
           L.Name AS Location,
           C.Name AS Country
         FROM Person P
         JOIN Writer W
           ON W.WriterId     = P.PersonId
         JOIN Country C
           ON C.CountryCode  = W.CountryCode
         JOIN Location L
           ON L.LocationCode = W.LocationCode;   
--

The DML views here discussed are as well included in this MySQL 8.0 db<>fiddle so that you can see and test them “in action”.


Endnotes

1 In some conceptual modelling techniques, supertype-subtype associations are referred to as superclass-subclass relationships.

2 Although you mention that there exist in fact more Roles that a Person may perform, but the three you revealed are good enough to discuss the scenario exposing several important ramifications.

3 But, as you noted, in the future an Actor might eventually provide one-to-many URLs.

4 Integration Definition for Information Modeling (IDEF1X) is a highly recommendable modelling technique that was established as a standard in December 1993 by the United States National Institute of Standards and Technology (NIST). It is based on (a) the early theoretical works authored by the sole originator of the relational model of data, i.e., Dr. E. F. Codd; on (b) the entity-relationship view, developed by Dr. P. P. Chen; and also on (c) the Logical Database Design Technique, created by Robert G. Brown.

5 The IDEF1X standard defines key migration as “The modeling process of placing the primary key of a parent or generic [i.e., a supertype] entity in its child or category entity [i.e., a subtype] as a foreign key”.

6 In IDEF1X, a role name is a distinctive label assigned to a FK attribute in order to express the meaning that it holds within the scope of its respective entity type.

7 Except, naturally, for the hypothetical conceptual properties (and logical columns) Director.Etcetera and UserProfile.Etcetera, which are merely placeholders that I used to expose the posibility of adding more properties (and columns) that apply to the corresponding conceptual entity type (and logical table).

8 E.g., appending an additional column with the AUTO_INCREMENT attribute to a table of a database “running” on MySQL.