INTRODUCTION AND RELEVANT INFORMATION:
The following example illustrates the problem I face:
Animal has a race, which can be a cat or a dog. Cat can be either Siamese or Persian. Dog can be a German shepherd or Labrador retriver.
Animal is a strong entity, while its race is an attribute that can have one of the two offered values ( cat or a dog ). Both these values are complex ( I have added here only the type of dog/cat to illustrate the problem, but there can also be the cat's / dog's name and bunch of other stuff ).
PROBLEM:
I don't know how to create relational tables for this example.
MY EFFORTS TO SOLVE THE PROBLEM:
I have tried to draw ER diagram, using Chen's notation, that represents the problem but being a beginner I don't know if I did it right. Here is what I have got:
I apologize if I drew something wrong, please correct me if that is the case. I don't wish to simply get "free solution" but also to learn how to deal with this problem so I can solve it on my own in the future.
The only thing that comes to my mind is to create two separate tables, one for cats and one for dogs. Also, the race attribute in the Animal table would only store cat or a dog value. Something like this:
Animal< # Animal_ID, race, other attributes >
Cat < # Cat_ID, $ Animal_ID, breed >
Dog < # Dog_ID, $ Animal_ID, breed >
I really have a bad feeling about my solution and I fear it is wrong, hence the below question.
QUESTIONS:
- How can I transform my example into ER diagram ?
- How to transform that ER diagram into relational tables?
If further info is required leave a comment and I will update my post as soon as possible. Also feel free to add appropriate tags since I am fairly new here.
Thank you.
Best Answer
The proper structure for this scenario is a SubClass / Inheritance model, and is nearly identical to the concept I proposed in this answer: Heterogeneous ordered list of value.
The model proposed in this question is actually quite close in that the
Animal
entity contains the type (i.e.race
) and the properties that are common across all types. However, there are two minor changes that are needed:Remove the Cat_ID and Dog_ID fields from their respective entities:
The key concept here is that everything is an
Animal
, regardless ofrace
:Cat
,Dog
,Elephant
, and so on. Given that starting point, any particularrace
ofAnimal
doesn't truly need a separate identifier since:Animal_ID
is uniqueCat
,Dog
, and any additionalrace
entities added in the future do not, by themselves, fully represent any particularAnimal
; they only have meaning when used in combination with the information contained in the parent entity,Animal
.Hence, the
Animal_ID
property in theCat
,Dog
, etc entities is both the PK and the FK back to theAnimal
entity.Differentiate between types of
breed
:Just because two properties share the same name does not necessarily mean that those properties are the same, even if the name being the same implies such a relationship. In this case, what you really have is actually
CatBreed
andDogBreed
as seperate "types"Initial Notes
VARCHAR
but if you need to store anything outside of the standard ASCII set, you should really useNVARCHAR
.Race
,CatBreed
, andDogBreed
) are not auto-incrementing (i.e. IDENTITY in terms of T-SQL) because they are application constants (i.e. they are part of the application) that are static lookup values in the database and are represented asenum
s in C# (or other languages). If values are added, they are added in controlled situations. I reserve the use of auto-increment fields for user data that comes in via the application."Breed" as "Race"-Specific Approach
This first set of tables are the lookup / types tables:
This second listing is the main "Animal" entity:
This third set of tables are the complimentary sub-class entities that complete the definition of each
Race
ofAnimal
:The model using a shared
breed
type is shown after the "Additional Notes" section.Additional Notes
breed
seems to be a focal point for confusion. It was suggested by jcolebrand (in a comment on the question) thatbreed
is a property shared across the differentrace
s, and the other two answers have it integrated as such in their models. This is a mistake, however, because the values forbreed
are not shared across the different values ofrace
. Yes, I am aware that the two other proposed models attempt to solve this issue by makingrace
a parent ofbreed
. While that technically solves the relationship issue, it doesn't help solve the overall modeling question of what to do about non-common properties, nor how to handle arace
that does not have abreed
. But, in the case that such a property were guaranteed to exist across allAnimal
s, I will include an option for that as well (below).Animal
), orrace
s be stored in theAnimal
entity which is a very flat (and nearly non-relational) way of representing this data. Yes, people do this all of the time, but it means having many NULL fields per row for the properties that are not meant for that particularrace
AND knowing which fields per row are associated with the particularrace
of that record.race
ofAnimal
in the future that does not havebreed
as a property. And even if ALLAnimal
s have abreed
, that wouldn't change the structure due to what has been previously noted aboutbreed
: thatbreed
is dependent on therace
(i.e.breed
forCat
is not the same thing asbreed
forDog
)."Breed" as Common- / Shared- Property Approach
Please note:
The SQL below can be run in the same database as the model presented above:
Race
table is the sameBreed
table is newAnimal
tables have been appended with a2
Breed
being a now common property, it does not seem right not to haveRace
noted in the main/parent entity (even if it is technically relationally correct). So, bothRaceID
andBreedID
are represented inAnimal2
. In order to prevent a mismatch between theRaceID
noted inAnimal2
and aBreedID
that is for a differentRaceID
, I have added a FK on bothRaceID, BreedID
that references a UNIQUE CONSTRAINT of those fields in theBreed
table. I usually despise pointing a FK to a UNIQUE CONSTRAINT, but here is one of the few valid reasons to do so. A UNIQUE CONSTRAINT is logically an "Alternate Key", which makes it valid for this use. Please also note that theBreed
table still has a PK on justBreedID
.BreedID
to be repeated across different values ofRaceID
.BreedID
, so it should still be possible to reference a specific value ofBreed
without having theRaceID
available.Breed
(and are why I prefer theRace
-specificBreed
tables).Breed
have the same properties. There is no easy way in this model to have disparate properties betweenDog
"breeds" andElephant
"breeds". However, there still is a way to do this, which is noted in the "Final Edit" section.Breed
across more than one race. I am not sure if that is desirable to do (or maybe not in the concept of animals but possibly in other situations that would be using this type of model), but it is not possible here.Final Edit (hopefully ;-)
Breed
, it is possible to employ the same subclass / inheritance concept but withBreed
as the main entity. In this setup theBreed
table would have the properties common to all types ofBreed
(just like theAnimal
table) andRaceID
would represent the type ofBreed
(same as it does in theAnimal
table). Then you would have subclass tables such asBreedCat
,BreedDog
, and so on. For smaller projects this might be considered "over-engineering", but it is being mentioned as an option for situations that would benefit from it.For both approaches, it sometimes helps to create Views as short-cuts to the full entities. For example, consider:
CreatedDate
field would be added to theAnimal
table. This field is not needed in any of the subclass tables (e.g.AnimalCat
) as the rows being inserted for both tables should be done at the same time within a transaction.LastModifiedDate
field would be added to theAnimal
table and all subclass tables. This field gets updated only if that particular table is updated: if an update occurs inAnimalCat
but not inAnimal
for a particularAnimalID
, then only theLastModifiedDate
field inAnimalCat
would be set.