I am relatively new to database design, and I decided to make my own hypothetical database for practice. However, I am having trouble modeling and normalizing it, as I esteem that there are numerous many-to-many (M:N) relationships.
General scenario description
The database is meant to retain data about various People that have worked on the Zelda series. I want to keep track of the Console(s) that a Game can be played on, Employees that have had a part in the Games development, the Jobs the Employee had (many Employees worked on different Jobs across multiple Games), etc.
Business rules
- Multiple Employees can work on multiple Games.
- Multiple Games can be on the same Console.
- Multiple Consoles can be a platform for the same Game.
- Multiple Employees can have the same Job.
- An Employee can have multiple Jobs.
- A Game can have multiple Employees.
- A Game can have multiple types of Jobs in it's development
- Multiple Games can have the same type of Job attached.
- A Console can have multiple People working on it.
- A Person can work on multiple Consoles.
Attribute names and sample values
- Employee Name, which can be split into First and Last (for example “John” and “Doe” )
- Game Title (for example “Ocarina of Time”)
- Job Title (for example “Level Design”, “Director”, “Composure”, “Level Designer”, “Programmer”, “Localization”, etc.).
- Console Name (for example “Game Boy Advance”)
The issue
So far, it seems no matter what I design there are data redundancies and M:N relationships between the entity types of interest everywhere. However I feel that database designers must run into this kind of problem all the time, so there must be a solution.
Note: I am well able to find the data to fill the table, the problem is organizing it into a database with tables in a normalized form.
Best Answer
Yes, the identification of many-to-many (M:N for brevity) associations or relationships is a situation that a database practitioner faces quite commonly when laying out a conceptual schema. Associations of said cardinality ratios come about in business environments of very different nature, and when properly represented at the logical level by means of, e.g., a SQL-DDL arrangement, they do not introduce harmful redundancies.
In this way, the objective of a database modeling exercise should be to mirror the relevant characteristics of the business context of interest with high precision; therefore, if you identify correctly that there are numerous M:N associations, then you must express them in (a) the conceptual schema and also in (b) the respective logical-level declarations, no matter how many connections of that —or any other— kind of cardinality ratios have to be addressed.
Business rules
You have supplied a well-contextualized question, and have also clarified that the database you are working on is purely hypothetical, which is an important point since I esteem that a “real world” business scenario like the one under consideration would be much more extensive and, hence, would imply more complex informational requirements.
I decided to (1) make a few modifications and expansions to the business rules you have provided in order to (2) produce a more descriptive conceptual schema —although still rather hypothetical—. Here are some of the formulations that I put together:
1 Party is a term used in legal contexts when referring to either an individual or a group of individuals that compose a single entity, so this denomination is suitable to represent People and Organizations.
IDEF1X diagram
Subsequently, I created the IDEF1X2 diagram shown in Figure 1 (make sure to click the link to see it in a higher resolution), consolidating in a single graphical device the business rules presented above (along with some others that seem relevant):
2 Integration Definition for Information Modeling (IDEF1X) is a highly recommendable data modeling technique that was established as a standard in December 1993 by the United States National Institute of Standards and Technology (NIST). It is based on (a) the early theoretical material authored by the sole originator of the relational model, i.e., Dr. E. F. Codd; on (b) the entity-relationship view of data, developed by Dr. P. P. Chen; and also on (c) the Logical Database Design Technique, created by Robert G. Brown.
As you can see, I have depicted only three M:N associations by way of the corresponding associative entity types, i.e.:
Among other aspects, there are two distinct supertype-subtype structures, where:
Person and Organization are mutually exclusive entity subtypes of Party, their entity supertype
Product is the supertype of System and Game, which in turn are mutually exclusive subtypes
In case you are not familiar with supertype-subtype associations, you might find of help, e.g., my answers to the questions entitled:
Illustrative logical SQL-DDL layout
Successively, we must make sure that, at the logical level:
So I declared the following DDL arrangement based on the IDEF1X diagram previously shown:
It is opportune to stress that there are declarations of composite PRIMARY KEY constraints across several tables, which stand for the hierarchy of connections that take place among conceptual entity types, arrangement that can be very beneficial with respect to data retrieval when, e.g., expressing SELECT operations that include JOIN clauses to obtain derived tables.
Yes, (i) every M:N association and (ii) every one of the associated entity types are denoted by (iii) the corresponding table in the logical DDL structure, so pay special attention to the PRIMARY and FOREIGN KEY constraints (and the notes I left as comments) of tables representing these conceptual elements, because they assist in ensuring that the connections between the relevant rows meet the applicable cardinality ratios.
The usage of composite keys was introduced by Dr. E. F. Codd from the very origin of the relational paradigm, as demonstrated in the examples he included in his 1970 seminal paper entitled A Relational Model for Large Shared Data Banks (which, precisely, also presents the most elegant method to handle conceptual M:N associations).
I put up a db<>fiddle and a SQL Fiddle, both running on Microsoft SQL Server 2014, so that the structure can be tested “in action”.
Normalization
Normalization is a logical-level procedure that implies, basically speaking:
Eliminating non-atomic columns via first normal form so that data manipulation and constriction are much easier to cope with by the data sublanguage of use (e.g., SQL).
Getting rid of undesirable dependencies among the columns of a specific table by virtue of the successive normal forms to avoid update anomalies.
Naturally, one has to take into account the meaning carried by the table(s) and column(s) at issue.
I like to think of normalization as a test founded on science that a designer applies to the pertinent elements once he or she has delineated a stable logical-level arrangement in order to determine whether its items comply with every one of the normal forms or not. Then, if needed, the designer takes the appropriate correcting measures.
Redundancy
In the relational model, while duplication of values contained in columns is not only acceptable but expected, duplicate rows are forbidden. To that extent, as far as I can see, duplicate rows and other kinds of harmful redundancies are prevented in all the tables comprised in the logical layout exposed before, perhaps you would like to clarify your concern in this regard.
Anyway, you can certainly (a) evaluate on your own said structure by dint of the normal forms to define if it meets the requirements and (b) modify it if necessary.
Related resources
Ternary associations
There is another important aspect that you brought up via comments (posted in an now-deleted answer):
That circumstance seems to indicate that one of your concerns has to do with conceptual ternary associations. Basically speaking, this sort of associations comes about when there exists (1) a relationship involving (2) two other relationships, in other words “a relationship between relationships” —a typical situation too, since a relationship is an entity in its own right—.
These arrangements, when suitably managed, do not bring harmful redundancies either. And, yes, if there is a certain use case where you identify that such relationships present themselves among “real world” entity types, you have to (i) model and (ii) declare them with accuracy at the logical level.