How to Transform Variable Entity into Relational Table

database-designsubtypes

INTRODUCTION AND RELEVANT INFORMATION:

The following example illustrates the problem I face:

Animal has a race, which can be a cat or a dog. Cat can be either Siamese or Persian. Dog can be a German shepherd or Labrador retriver.

Animal is a strong entity, while its race is an attribute that can have one of the two offered values ( cat or a dog ). Both these values are complex ( I have added here only the type of dog/cat to illustrate the problem, but there can also be the cat's / dog's name and bunch of other stuff ).

PROBLEM:

I don't know how to create relational tables for this example.

MY EFFORTS TO SOLVE THE PROBLEM:

I have tried to draw ER diagram, using Chen's notation, that represents the problem but being a beginner I don't know if I did it right. Here is what I have got:

enter image description here

I apologize if I drew something wrong, please correct me if that is the case. I don't wish to simply get "free solution" but also to learn how to deal with this problem so I can solve it on my own in the future.

The only thing that comes to my mind is to create two separate tables, one for cats and one for dogs. Also, the race attribute in the Animal table would only store cat or a dog value. Something like this:

Animal< # Animal_ID, race, other attributes >
Cat < # Cat_ID, $ Animal_ID, breed >
Dog < # Dog_ID, $ Animal_ID, breed >

I really have a bad feeling about my solution and I fear it is wrong, hence the below question.

QUESTIONS:

How can I transform my example into ER diagram ?
How to transform that ER diagram into relational tables?

If further info is required leave a comment and I will update my post as soon as possible. Also feel free to add appropriate tags since I am fairly new here.

Thank you.

Best Answer

The proper structure for this scenario is a SubClass / Inheritance model, and is nearly identical to the concept I proposed in this answer: Heterogeneous ordered list of value.

The model proposed in this question is actually quite close in that the Animal entity contains the type (i.e. race) and the properties that are common across all types. However, there are two minor changes that are needed:

Remove the Cat_ID and Dog_ID fields from their respective entities:

The key concept here is that everything is an Animal, regardless of race: Cat, Dog, Elephant, and so on. Given that starting point, any particular race of Animal doesn't truly need a separate identifier since:
1. the Animal_ID is unique
2. the Cat, Dog, and any additional race entities added in the future do not, by themselves, fully represent any particular Animal; they only have meaning when used in combination with the information contained in the parent entity, Animal.
Hence, the Animal_ID property in the Cat, Dog, etc entities is both the PK and the FK back to the Animal entity.
Differentiate between types of breed:

Just because two properties share the same name does not necessarily mean that those properties are the same, even if the name being the same implies such a relationship. In this case, what you really have is actually CatBreed and DogBreed as seperate "types"

Initial Notes

The SQL is specific to Microsoft SQL Server (i.e. is T-SQL). Meaning, be careful about datatypes as they are not the same across all RDBMS's. For example, I am using VARCHAR but if you need to store anything outside of the standard ASCII set, you should really use NVARCHAR.
The ID fields of the "type" tables (Race, CatBreed, and DogBreed) are not auto-incrementing (i.e. IDENTITY in terms of T-SQL) because they are application constants (i.e. they are part of the application) that are static lookup values in the database and are represented as enums in C# (or other languages). If values are added, they are added in controlled situations. I reserve the use of auto-increment fields for user data that comes in via the application.
The naming convention I use is to name each subclass table starting with the main class name followed by the subclass name. This helps organize the tables as well as indicates clearly (without looking at the FKs) the relationship of the subclass table to the main entity table.
Please see "Final Edit" section at the end for a note regarding Views.

"Breed" as "Race"-Specific Approach

Breed As Race-specific Diagram
This first set of tables are the lookup / types tables:

CREATE TABLE Race
(
  RaceID INT NOT NULL PRIMARY KEY
  RaceName VARCHAR(50) NOT NULL
);

CREATE TABLE CatBreed
(
  CatBreedID INT NOT NULL PRIMARY KEY,
  BreedName VARCHAR(50),
  CatBreedAttribute1 INT,
  CatBreedAttribute2 VARCHAR(10)
  -- other "CatBreed"-specific properties as needed
);

CREATE TABLE DogBreed
(
  DogBreedID INT NOT NULL PRIMARY KEY,
  BreedName VARCHAR(50),
  DogBreedAttribute1 TINYINT
  -- other "DogBreed"-specific properties as needed
);

This second listing is the main "Animal" entity:

CREATE TABLE Animal
(
  AnimalID INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
  RaceID INT NOT NULL, -- FK to Race
  Name VARCHAR(50)
  -- other "Animal" properties that are shared across "Race" types
);

ALTER TABLE Animal
  ADD CONSTRAINT [FK_Animal_Race]
  FOREIGN KEY (RaceID)
  REFERENCES Race (RaceID);

This third set of tables are the complimentary sub-class entities that complete the definition of each Race of Animal:

CREATE TABLE AnimalCat
(
  AnimalID INT NOT NULL PRIMARY KEY, -- FK to Animal
  CatBreedID INT NOT NULL, -- FK to CatBreed
  HairColor VARCHAR(50) NOT NULL
  -- other "Cat"-specific properties as needed
);

ALTER TABLE AnimalCat
  ADD CONSTRAINT [FK_AnimalCat_CatBreed]
  FOREIGN KEY (CatBreedID)
  REFERENCES CatBreed (CatBreedID);

ALTER TABLE AnimalCat
  ADD CONSTRAINT [FK_AnimalCat_Animal]
  FOREIGN KEY (AnimalID)
  REFERENCES Animal (AnimalID);


CREATE TABLE AnimalDog
(
  AnimalID INT NOT NULL PRIMARY KEY, -- FK to Animal
  DogBreedID INT NOT NULL, -- FK to DogBreed
  HairColor VARCHAR(50) NOT NULL
  -- other "Dog"-specific properties as needed
);

ALTER TABLE AnimalDog
  ADD CONSTRAINT [FK_AnimalDog_DogBreed]
  FOREIGN KEY (DogBreedID)
  REFERENCES DogBreed (DogBreedID);

ALTER TABLE AnimalDog
  ADD CONSTRAINT [FK_AnimalDog_Animal]
  FOREIGN KEY (AnimalID)
  REFERENCES Animal (AnimalID);

The model using a shared breed type is shown after the "Additional Notes" section.

Additional Notes

The concept of breed seems to be a focal point for confusion. It was suggested by jcolebrand (in a comment on the question) that breed is a property shared across the different races, and the other two answers have it integrated as such in their models. This is a mistake, however, because the values for breed are not shared across the different values of race. Yes, I am aware that the two other proposed models attempt to solve this issue by making race a parent of breed. While that technically solves the relationship issue, it doesn't help solve the overall modeling question of what to do about non-common properties, nor how to handle a race that does not have a breed. But, in the case that such a property were guaranteed to exist across all Animals, I will include an option for that as well (below).
The models proposed by vijayp and DavidN (which seem to be identical) do not work because:
1. They either
  1. do not allow for non-common properties to be stored (at least not for individual instances of any Animal), or
  2. require that all properties for all races be stored in the Animal entity which is a very flat (and nearly non-relational) way of representing this data. Yes, people do this all of the time, but it means having many NULL fields per row for the properties that are not meant for that particular race AND knowing which fields per row are associated with the particular race of that record.
2. They do not allow for adding a race of Animal in the future that does not have breed as a property. And even if ALL Animals have a breed, that wouldn't change the structure due to what has been previously noted about breed: that breed is dependent on the race (i.e. breed for Cat is not the same thing as breed for Dog).

"Breed" as Common- / Shared- Property Approach

enter image description here
Please note:

The SQL below can be run in the same database as the model presented above:
1. The Race table is the same
2. The Breed table is new
3. The three Animal tables have been appended with a 2
Even with Breed being a now common property, it does not seem right not to have Race noted in the main/parent entity (even if it is technically relationally correct). So, both RaceID and BreedID are represented in Animal2. In order to prevent a mismatch between the RaceID noted in Animal2 and a BreedID that is for a different RaceID, I have added a FK on both RaceID, BreedID that references a UNIQUE CONSTRAINT of those fields in the Breed table. I usually despise pointing a FK to a UNIQUE CONSTRAINT, but here is one of the few valid reasons to do so. A UNIQUE CONSTRAINT is logically an "Alternate Key", which makes it valid for this use. Please also note that the Breed table still has a PK on just BreedID.
1. The reason for not going with just a PK on the combined fields and no UNIQUE CONSTRAINT is that it would allow for the same BreedID to be repeated across different values of RaceID.
2. The reason for not switching which the PK and UNIQUE CONSTRAINT around is that this might not be the only usage of BreedID, so it should still be possible to reference a specific value of Breed without having the RaceID available.
While the following model does work, it has two potential flaws regarding the concept of a shared Breed (and are why I prefer the Race-specific Breed tables).
1. There is an implicit assumption that ALL values of Breed have the same properties. There is no easy way in this model to have disparate properties between Dog "breeds" and Elephant "breeds". However, there still is a way to do this, which is noted in the "Final Edit" section.
2. There is no way to share a Breed across more than one race. I am not sure if that is desirable to do (or maybe not in the concept of animals but possibly in other situations that would be using this type of model), but it is not possible here.

CREATE TABLE Race
(
  RaceID INT NOT NULL PRIMARY KEY,
  RaceName VARCHAR(50) NOT NULL
);

CREATE TABLE Breed
(
  BreedID INT NOT NULL PRIMARY KEY,
  RaceID INT NOT NULL, -- FK to Race
  BreedName VARCHAR(50)
);

ALTER TABLE Breed
  ADD CONSTRAINT [UQ_Breed]
  UNIQUE (RaceID, BreedID);

ALTER TABLE Breed
  ADD CONSTRAINT [FK_Breed_Race]
  FOREIGN KEY (RaceID)
  REFERENCES Race (RaceID);

CREATE TABLE Animal2
(
  AnimalID INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
  RaceID INT NOT NULL, -- FK to Race, FK to Breed
  BreedID INT NOT NULL, -- FK to Breed
  Name VARCHAR(50)
  -- other properties common to all "Animal" types
);

ALTER TABLE Animal2
  ADD CONSTRAINT [FK_Animal2_Race]
  FOREIGN KEY (RaceID)
  REFERENCES Race (RaceID);

-- This FK points to the UNIQUE CONSTRAINT on Breed, _not_ to the PK!
ALTER TABLE Animal2
  ADD CONSTRAINT [FK_Animal2_Breed]
  FOREIGN KEY (RaceID, BreedID)
  REFERENCES Breed (RaceID, BreedID);


CREATE TABLE AnimalCat2
(
  AnimalID INT NOT NULL PRIMARY KEY, -- FK to Animal
  HairColor VARCHAR(50) NOT NULL
);

ALTER TABLE AnimalCat2
  ADD CONSTRAINT [FK_AnimalCat2_Animal2]
  FOREIGN KEY (AnimalID)
  REFERENCES Animal2 (AnimalID);

CREATE TABLE AnimalDog2
(
  AnimalID INT NOT NULL PRIMARY KEY,
  HairColor VARCHAR(50) NOT NULL
);

ALTER TABLE AnimalDog2
  ADD CONSTRAINT [FK_AnimalDog2_Animal2]
  FOREIGN KEY (AnimalID)
  REFERENCES Animal2 (AnimalID);

Final Edit (hopefully ;-)

Regarding the possibility (and then difficulty) of handling disparate properties between types of Breed, it is possible to employ the same subclass / inheritance concept but with Breed as the main entity. In this setup the Breed table would have the properties common to all types of Breed (just like the Animal table) and RaceID would represent the type of Breed (same as it does in the Animal table). Then you would have subclass tables such as BreedCat, BreedDog, and so on. For smaller projects this might be considered "over-engineering", but it is being mentioned as an option for situations that would benefit from it.

For both approaches, it sometimes helps to create Views as short-cuts to the full entities. For example, consider:

CREATE VIEW Cats AS
   SELECT  an.AnimalID,
           an.RaceID,
           an.Name,
           -- other "Animal" properties that are shared across "Race" types
           cat.CatBreedID,
           cat.HairColor
           -- other "Cat"-specific properties as needed
   FROM    Animal an
   INNER JOIN  AnimalCat cat
           ON  cat.AnimalID = an.AnimalID
   -- maybe add in JOIN(s) and field(s) for "Race" and/or "Breed"

While not part of the logical entities, it is fairly common to have audit fields in the tables to at least get a sense of when the records are being inserted and updated. So in practical terms:
1. A CreatedDate field would be added to the Animal table. This field is not needed in any of the subclass tables (e.g. AnimalCat) as the rows being inserted for both tables should be done at the same time within a transaction.
2. A LastModifiedDate field would be added to the Animal table and all subclass tables. This field gets updated only if that particular table is updated: if an update occurs in AnimalCat but not in Animal for a particular AnimalID, then only the LastModifiedDate field in AnimalCat would be set.