Modeling a scenario in which each Music Artist is either a Group or a Solo Performer

database-designdatabase-diagramserdmany-to-manysubtypes

I have to design an entity-relationship diagram (ERD) for a business context that involves the delineation of music artists, as I will detail below.

Scenario description

  • An Artist has a Name, and must be either a Group or a Solo Performer (but not both).

  • A Group is made up of one or more Solo Performers and has a Number of Members (which should be calculated from the number of Solo Performers making up the Group).

  • A Solo Performer may be a Member of many Groups or of no Group and may play one or more Instruments.

Question

How to construct an ERD to represent such scenario? I’m confused with the ‘or’ part of it.

Best Answer

The part of the scenario that you are confused with can be modeled with a classic construct called supertype-subtype1 structure.

I will (1) introduce some pertinent preliminary ideas, (2) detail how I would delineate —at the conceptual level— the business context under consideration, and (3) provide additional related material —e.g., the corresponding logical-level representation via SQL-DDL declarations— as follows.

Introduction

A structure of this nature takes place when, in a given business environment, there is a cluster of entity types within which the supertype has one or more properties (or attributes) that are shared by the rest of the entity types in the cluster, i.e., the subtypes. Every subtype has, in turn, a particular set of properties that are applicable to itself only.

Supertype-subtype clusters can be of two kinds:

  • Exclusive. Comes about when an instance of the superentity type must always have one and only one subtype counterpart; therefore, the potential subtype occurrences in question are mutually exclusive. This is the kind that concerns to your scenario.

    A typical case in which an exclusive supertype-subtype comes about is a business domain where both an Organization and a Person are considered Legal Parties, like in the situation deliberated in this series of posts.

  • Nonexclusive. Presents itself when a supertype instance may be complemented by multiple subtype occurrences, each of which is compelled to be of a different category.

    An example of this kind of supertype-subtype is dealt with in these posts.

Notes: It is worth mentioning that supertype-subtype structures —being elements of a conceptual character— do not belong to a specific data management theoretical framework, be it relational, network or hierarchical —each of which offers particular structures to represent conceptual elements—.

It is also opportune to point out that although supertype-subtype clusters bear a certain resemblance to object-oriented application programming (OOP) inheritance and polymorphism, they are in fact distinct devices because they serve different purposes. In a database conceptual model —that must represent real world aspects— one deals with structural features in order to describe informational requirements, whereas in OOP polymorphism and inheritance, among other things, one (a) sketches and (b) implements computational and behavioural characteristics, aspects that decidedly belong to appplication program design and programming.

Apart from that, an individual OOP class —being an application program component—, does not necessarily have to “mirror” the structure of an individual entity type that belongs to the conceptual level of the database at hand. In this respect, an application programmer may typically create, e.g., one single class that “combines” all the properties of two (or more) different conceptual-level entity types, and such a class may as well include computed properties.

Using entity-relationship constructs to represent a conceptual model with supertype-subtype structures

You asked for an entity-relationship diagram (ERD for brevity) but, although being an extraordinary modeling platform, the original method —as introduced by Dr. Peter Pin-Shan Chen2— did not supply enough constructs to represent scenarios of the sort being discussed with the precision that a proper database conceptual model requires.

Consequently, it was necessary to make some extensions to said method, situation that yielded results in the development of an approach that assists in the creation of enhanced entity-relationship diagrams (EERDs) that, naturally, enriched the initial diagramming technique with new expressive characteristics. One of those characteristics is, precisely, the possibility of depicting supertype-subtype structures.

Modeling your context of interest

The illustration shown in Figure 1 is an EERD (using symbols similar to the ones proposed by Ramez A. Elmasri and Shamkant B. Navathe3, who refer to such structures as superclass/subclass) where I modeled the business domain you describe considering all the specifications. It is also available as a PDF that can be downloaded from Dropbox.

As you can see in the aforementioned diagram, both Group and SoloPerformer are displayed as exclusive subtypes of the Artist superentity type:

Music Artists Enhanced Entity-Relationship Diagram

Diagram description

In order to start the description of the EERD, it is important to point out that your sentence

  • “an Artist must be either a Group or a SoloPerformer (but not both)”

is related to the disjointness and the completeness aspects of the supertype-subtype cluster at hand.

Disjointness

The disjointness feature is particularly important because it is right here where the “or part” that you mentioned comes into play, due to the fact that an Artist has to be either one subtype instance or the other, which I specified in the EERD through the small circle containing the letter “d”, a construct that receives the name of disjoint rule.

When a supertype may be supplemented by one or more of its possible subtypes, this point must be expressed by a small circle holding a label with the letter “o”, a symbol called overlap rule.

Discriminator property

Also within the scope of the disjointness factor of this supertype-subtype association, it is worth paying close attention to the Artist.Type property, since it carries out a very relevant task in this arrangement: it functions as the subtype discriminator. It is named in this way as it is the property that points out the exclusive kind of subtype with which a specific instance of an Artist relates to.

In the cases of nonexclusive clusters, the use of a discriminator property is needless, for a certain supertype can have multiple subtypes as complements (as brought up above).

Total specialization rule and completness

The requirement that stipulates that every Artist must always have a supplementary subtype instance has to do with the completness characteristic of this cluster. This is delineated by means of a total specialization rule, demonstrated via the double-line symbol connecting (a) the Artist supertype with (b) the disjoint rule construct.

Relating Groups with Solo Performers

Evaluating the sentences

  • “a Group is made up of one or more SoloPerformers

and

  • “a SoloPerformer may be a member of many Groups or of no Group”,

one can recognize that both subtypes are involved in a many-to-many (M:N) association (or relationship), which I represented with the diamond-shaped box labeled as Group-SoloPerformer.

If implemented in a relational database as a base table, this component would be very useful to derive (i.e., to carry out the calculation of) the total Number of SoloPerformers that make up a concrete Group (one of the requirements that you specified).

The association between Solo Performers and Instruments

The stipulation

  • “a SoloPerformer […] may play one or more Instruments”

permits us inferring that, at the same time,

  • “an Instrument is played by zero, one or more SoloPerformers”.

Thus, this is another example of a M:N association, and I used the diamond-shaped figure designated SoloPerformer-Instrument to expose it.

Additional material

In order to expound the scope of supertype-subtype structures I am going to include two more resources, i.e.,

  1. an IDEF1X4 diagram presented in Figure 2 (and you can download it from Dropbox as a PDF, as well) that illustrates the expressive capabilities of this sort of diagrams regarding the business domain at issue; and

  2. the respective expository DDL logical structure that exemplifies how to manage the full scenario under discussion by virtue of a SQL database management system.

1. IDEF1X representation

The IDEF1X information modeling technique certainly offers the capability of portraying supertype-subtype structures, though with a limitation: it does not lend a visual mechanism to indicate whether a precise cluster is of an exclusive or noneclusive kind (its “native” symbols can only communicate the complete or incomplete identification of all the possible subentity types of significance). Fortunately, one can employ Information Engineering (IE) notation to show this paramount aspect more accurately while taking advantage of the descriptive power of the IDEF1X standard.

In this technique, the main feature of your question is denominated “categorization relationship”, where a supertype is referred to as “generic entity” and a subtype receives the name of “category entity”. However, I will continue applying the term supertype-subtype in this post because (1) it was used by Dr. Edgar Frank Codd, the originator of the relational model, (2) it is more widely known and (3) IE notation is used instead of the “native” one.

Music Artists IDEF1X Diagram

Foreign keys and supertype-subtype clusters

As demonstrated, IDEF1X provides a further advantage: the means to exhibit FOREIGN KEY (FK) definitions, elements of prime importance if a practitioner is going to represent a supertype-subtype association in a relational database.

In order to portray such a sort of association, the PRIMARY KEY (PK) property of the supertype, i.e. Artist.ArtistNumber, has to migrate to Group and SoloPerformer, although it has been assigned two different role names5, 6, GroupNumber and SoloPerformerNumber respectively, for the purpose of emphasizing the meaning conveyed by the property in the context of each subentity type.

Apart from being characterized as PKs, the Group.GroupNumber and SoloPerformer.SoloPerformerNumber properties are, at the same time, depicted as FOREIGN KEYs (FKs) that make a reference to Artist.ArtistNumber, the supertype PK property.

So, since every SoloPerformer and Group occurrence is existence-dependent on an exact Artist instance, these entity types are involved in an identifying association that takes effect by way of the PK property migration process delineated in the preceding paragraphs.

Foreign keys and associative entity types

The IDEF1X diagram serves as well to illustrate the FKs that compose the PKs of the two associative entity types of relevance, i.e., GroupMember and SoloPerformerInstrument; the first one connects the two subtypes, and the second one links a subtype with an independent entity type, i.e., Instrument.

2. Expository SQL-DDL logical declarations

As explained before, a supertype-subtype structure is a means to express certain kinds of business-domain-specific conceptualizations regarding informational requirements, which can in turn be represented in a database by means of distinct constructs that must correspond to those offered by the particular theoretical paradigm (be it relational, network or hierarchical) followed by the database management system being utilized by the designer.

One of the multiple advantages of the relational paradigm is that it permits representing the information in its natural structure, and the most popular approximations to the systems proposed in the relational theory are the various SQL database management systems.

So, finally, here are some sample DDL statements —including (a) base tables schemas along with (b) some of the pertinent constraints— that represent, at the logical level of abstraction, the conceptual modeling exercise treated above:

--
--
CREATE TABLE Artist ( -- Stands for the supertype.
    ArtistNumber    INT      NOT NULL,
    Name            CHAR(30) NOT NULL,
    Type            CHAR(1)  NOT NULL, -- Holds the discriminator values.
    CreatedDateTime DATETIME NOT NULL,
    --
    CONSTRAINT Artist_PK      PRIMARY KEY (ArtistNumber),
    CONSTRAINT Artist_AK      UNIQUE      (Name), -- ALTERNATE KEY.
    CONSTRAINT Artist_Type_CK CHECK       (Type IN ('G', 'S')) -- Enforces retaining either ‘G’, for ‘Group’, or ‘S’, for ‘SoloPerformer’, only.
);

CREATE TABLE MyGroup ( -- Represents one subtype.
    GroupNumber   INT  NOT NULL, -- To be constrained as PK and FK simultaneously.
    FormationDate DATE NOT NULL,
    --
    CONSTRAINT MyGroup_PK         PRIMARY KEY (GroupNumber),
    CONSTRAINT MyGroupToArtist_FK FOREIGN KEY (GroupNumber)
        REFERENCES Artist (ArtistNumber)  
);

CREATE TABLE SoloPerformer ( -- Denotes the other subtype.
    SoloPerformerNumber INT  NOT NULL, -- To be constrained as PK and FK simultaneously.
    BirthDate           DATE NOT NULL,
    --
    CONSTRAINT SoloPerformer_PK               PRIMARY KEY (SoloPerformerNumber),
    CONSTRAINT SoloPerformerNumberToArtist_FK FOREIGN KEY (SoloPerformerNumber)
        REFERENCES Artist (ArtistNumber)  
);

CREATE TABLE GroupMember ( -- Stands for a M:N association involving the two subtypes.
    MemberNumber INT  NOT NULL,
    GroupNumber  INT  NOT NULL,
    JoinedDate   DATE NOT NULL,
    --
    CONSTRAINT GroupMember_PK                PRIMARY KEY (MemberNumber, GroupNumber), -- Composite PK.
    CONSTRAINT GroupMemberToSoloPerformer_FK FOREIGN KEY (MemberNumber)
        REFERENCES SoloPerformer (SoloPerformerNumber),
    CONSTRAINT GroupMemberToMyGroup_FK       FOREIGN KEY (GroupNumber)
        REFERENCES MyGroup       (GroupNumber)  
);

CREATE TABLE Instrument ( -- Represents an independent entity type.
    InstrumentNumber INT      NOT NULL,
    Name             CHAR(30) NOT NULL,
    --
    CONSTRAINT Instrument_PK PRIMARY KEY (InstrumentNumber),
    CONSTRAINT Instrument_AK UNIQUE      (Name) -- ALTERNATE KEY.  
);

CREATE TABLE SoloPerformerInstrument ( -- Denotes another M:N association, in this case between a subtype and an independent entity type.
    SoloPerformerNumber INT  NOT NULL,
    InstrumentNumber    INT  NOT NULL,
    CreatedDate         DATE NOT NULL,
    --
    CONSTRAINT SoloPerformerInstrument_PK                PRIMARY KEY (SoloPerformerNumber, InstrumentNumber), -- Composite PK.
    CONSTRAINT SoloPerformerInstrumentToSoloPerformer_FK FOREIGN KEY (SoloPerformerNumber)
        REFERENCES SoloPerformer (SoloPerformerNumber),
    CONSTRAINT SoloPerformerInstrumentToInstrument_FK    FOREIGN KEY (InstrumentNumber)
        REFERENCES Instrument    (InstrumentNumber)  
);
--
--

Data integrity and consistency considerations

In agreement with all that has been previously explained, the designer must guarantee that each “supertype” row is at all times complemented by its accompanying “subtype” counterpart and, in turn, make sure that said “subtype” row is compatible with the value contained in the supertype “discriminator” column.

It would be very practical and elegant to enforce said circumstances declaratively (as the relational framework proposes) but, alas, none of the major SQL platforms has provided the suitable mechanisms to do so (as far as I know). Therefore, it is highly convenient to employ ACID TRANSACTIONS so that these conditions are always met in a database (other option would be to make use of TRIGGERS, but they tend to make things untidy).

Data derivation considerations

One of the main aspects of the relational model is that it considers data derivation as a paramount factor in data management. In accordance, it facilitates creating (a) base relations —or base tables in SQL, as shown in the DDL statements above— and (b) derived relations —derived tables in SQL, i.e., those declared by dint of SELECT operations that may be fixed as views for further exploitation—.

So, one can declare a view that gathers the “full” Group data points:

CREATE VIEW FullGroup AS
    SELECT G.GroupNumber,
           A.Name,
           A.CreatedDateTime,
           G.FormationDate
         FROM Artist A
         JOIN MyGroup G 
           ON G.GroupNumber = A.ArtistNumber;

And other view that combines the “full” SoloPerformer pieces of information:

CREATE VIEW FullSoloPerformer AS
    SELECT SP.SoloPerformerNumber,
            A.Name,
            A.CreatedDateTime,
           SP.BirthDate
         FROM Artist A
         JOIN SoloPerformer SP 
           ON SP.SoloPerformerNumber = A.ArtistNumber;

In this manner, it is very easy to manipulate —declaratively— all the significant data via the very same logical-level device, i.e., the relation or table (be it base or derived). Evidently, the usage of views is more effective when the conceptual entity types represented in a relational database possess more properties of interest, but it is a possibility worth illustrating with the present scenario.


References

1 Codd, E. F. (Dec. 1979). Extending the Database Relational Model to Capture More Meaning, ACM Transactions on Database Systems, Volume 4 Issue 4 (pp. 397-434). New York, NY, USA.

2 Chen, P. P. (March 1976). The Entity-Relationship Model—Toward a Unified View of Data, ACM Transactions on Database Systems - Special issue: Papers from the International Conference on Very Large Data Bases: September 22–24, 1975, Framingham, MA, Volume 1 Issue 1 (pp. 9-36). New York, NY, USA.

3 Elmasri, R & Navathe, S. B. (2003). Fundamentals of Database Systems, Fourth Edition. Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA.

4 National Institute of Standards and Technology (U.S.) [NIST] (Dec. 1993). Integration Definition for Information Modeling (IDEF1X), Federal Information Processing Standards Publication, Volume 184. USA.

5 Codd, E. F. (June 1970). A Relational Model of Data for Large Shared Data Banks, Communications of the ACM, Volume 13 Issue 6 (pp. 377-387). New York, NY, USA.

6 See reference 4