I'm designing a database that will store users of varying types. Predominantly (but not exclusively) they will be Actors, Directors and Writers. Currently there are just four user types that are of relevance. There is a outside chance that this number may increase, but the probability is low – and in such a case would be by a very small number.
The plan is to have a users
table responsible pretty much solely for logging into the site (name
, email
and password
columns plus one or two others such as whether they've been approved, and updated_at), and additional tables for each of the respective user types that each have their own unique set of columns. Only actors, for example, will have an ethnicity column, only Directors would have a bio column, and only Writers would need to provide their location. However, as I've not managed a database of this complexity before, I'm wondering how to organise a couple of aspects:
Firstly, users can be any one, or any combination, of the above types. So I understand I would need something like (for example) a director_user
table with director_id
and user_id
columns. Would this then be sufficient to be able to filter all users by role type and so on?
Secondly, most users will the option of a twitter profile and phone number. And all actors will have to include at least one URL for any of their other online actor profiles; currently there are three that they can include, but this number may increase. Am I right in assuming that a separate table for each of the possible profiles/contact methods is an optimal way to organise data?
Best Answer
According to my interpretation of your description of the relevant business context, you are dealing with a supertype-subtype1 structure where (a) Actor, Director and Writer are entity subtypes of (b) Person, their entity supertype, and (c) said subtypes are not mutually exclusive.
In this way, if you are interested in building a relational database that mirrors such a scenario accurately —and hence are expecting that it functions as such—, your following comment clarifications are quite significant with respect to the previous points, because they have implications at both (1) the conceptual and (2) the logical levels of representation of the database in question:
I will elaborate on all those aspects and several other critical factors in the sections below.
Business rules
In order to first define the corresponding conceptual schema —which can be used as a subsequent reference so that you can adapt it to make certain that it meets the exact informational requirements—, I have formulated some business rules that are of particular importance:
Expository IDEF1X diagram
Then, I created the IDEF1X4 diagram shown in Figure 1, which groups all the formulations above along with other rules that appear pertinent:
As demonstrated, the Person supertype (i) has its own box, (ii) possesses the properties or attributes that apply to all the subtypes, and (iii) presents lines that connect it with the boxes of every subtype.
In turn, every subtype (a) appears in its own dedicated box, and (b) holds only its applicable properties. The PRIMARY KEY of the supertype, PersonId, migrates5 to the subtypes with the role names6 ActorId, DirectorId, and WriterId respectively.
Also, I avoided coupling Person with the UserProfile entity type, which permits separating all their contextual implications, associations or relationships, etc. The PersonId property has migrated to UserProfile with the role name UserId.
You state in the question body that
…so URL is an entity type in its own right, and is directly associated with the Actor subtype in accordance with this quote.
And, in comments, you specify that
…then, among other features, I included Headshot as a property of the Actor entity type.
As for the Ethnicity and the Location entity types, they of course may entail more complex organizations (e.g., an Actor may belong to one, two or more different ethnic groups in distinct proportions, and a Writer may be based on a place that requires recording country, administrative region, county, etc.) but it looks like that the needs of your business context are covered successfully with the structures here modelled.
Naturally, you can make as many adjustments as necessary.
Illustrative SQL-DDL logical design
Consequently, based on the IDEF1X diagram shown and described above, I wrote the logical DDL layout that is shown as follows (I have supplied notes as comments that explain some of the characteristics that I esteem particularly important with respect to the tables, columns and constraints declared):
This has been tested in this db<>fiddle that runs on MySQL 8.0.
Therefore, (1) every singular aspect of the logical layout above carries a very precise meaning from (2) a singular feature of the business environment of interest7 —in agreement with the spirit of the relational framework by Dr. Edgar Frank Codd—, because:
Each base table represents an individual entity type.
Each column stands for a single property of the respective entity type.
A specific data type is fixed for each column in order to ensure that all the values it contains belong to a particular and properly-delimited set a , be it INT, DATETIME, CHAR, etc (and let us hope that MySQL will finally incorporate DOMAIN support in near-future version).
Multiple constraints are configured (declaratively) in order to guarantee that the assertions in form of rows retained in all the tables comply with the business rules determined at the conceptual level.
Each row is meant to convey well-defined semantics, e.g., a
Person
row is read:The Person identified by PersonId
r
is called by the FirstNames
and the LastNamet
, was born on BirthDateu
, has the GenderCodev
, tweets on the TwitterProfilew
, is reached through PhoneNumberx
, is contacted via the EmailAddressy
, and was registered on CreatedDateTimez
.Having a layout like this is decidedly favourable, as you can derive new tables (e.g., SELECT operations that gather columns FROM multiple tables with the help of the JOIN clause) that —in succession— carry a very precise meaning too (see the section entitled “Views” below).
It is woth to mention that, with this configuration, (i) a row representing a subtype instance is identified by (ii) the same PRIMARY KEY value that distinguishes the row denoting the complementary supertype occurrence. Thus, it is more than opportune to note that
With this logical design, if new subtypes are defined as relevant in your business context, you would have to declare a new base table, but that happens as well when other kinds of entity types are deemed of significance, so said situation would be, in fact, ordinary.
Views
In order to “fetch”, e.g., all the information that corresponds to an Actor, Director or Writer, you can declare some views (i.e., derived or expressible tables) so that you can SELECT directly from one single resource without having to write the concerning JOINs every time; e.g., with the VIEW declared below, you can obtain the “full” Actor information:
Of course, you may follow a similar approach in order to retrieve the “full” Director and Writer information:
The DML views here discussed are as well included in this MySQL 8.0 db<>fiddle so that you can see and test them “in action”.
Endnotes
1 In some conceptual modelling techniques, supertype-subtype associations are referred to as superclass-subclass relationships.
2 Although you mention that there exist in fact more Roles that a Person may perform, but the three you revealed are good enough to discuss the scenario exposing several important ramifications.
3 But, as you noted, in the future an Actor might eventually provide one-to-many URLs.
4 Integration Definition for Information Modeling (IDEF1X) is a highly recommendable modelling technique that was established as a standard in December 1993 by the United States National Institute of Standards and Technology (NIST). It is based on (a) the early theoretical works authored by the sole originator of the relational model of data, i.e., Dr. E. F. Codd; on (b) the entity-relationship view, developed by Dr. P. P. Chen; and also on (c) the Logical Database Design Technique, created by Robert G. Brown.
5 The IDEF1X standard defines key migration as “The modeling process of placing the primary key of a parent or generic [i.e., a supertype] entity in its child or category entity [i.e., a subtype] as a foreign key”.
6 In IDEF1X, a role name is a distinctive label assigned to a FK attribute in order to express the meaning that it holds within the scope of its respective entity type.
7 Except, naturally, for the hypothetical conceptual properties (and logical columns) Director.Etcetera and UserProfile.Etcetera, which are merely placeholders that I used to expose the posibility of adding more properties (and columns) that apply to the corresponding conceptual entity type (and logical table).
8 E.g., appending an additional column with the AUTO_INCREMENT attribute to a table of a database “running” on MySQL.