I am taking a course to become a Java developer.
The course involves using databases but we never really design any, unfortunately.
Most of the time we get pre-made databases and we have to implement code on it to Insert, Update, Read or Delete data.
But when my final test comes, chances are I will be making something that involves a database and therefore I want to try designing a few smaller ones to get the hang of a designing since I noticed that a good database design makes a lot of difference when writing the code to work with it.
I hope these types of questions are allowed here and that it's not going to be too broad.
This is a small design I made for something a simple as clubs
and members
with addresses
and phonenumbers
.
- There will be multiple clubs.
- Each club will have multiple members (obvious)
- A member cannot be part of multiple clubs
- A member can have multiple phonenumbers and email addresses
- A member can only have one address
- An address can belong to different members (couples or siblings)
My biggest confusions:
If I want to add a column in
Club
that describes the owner, who will also be a member, what is the best approach without having the
same member listed twice?Should I put all my tables on the automatic increment of an id or would this be a bad idea? (benefits/downsides?)
If I add foreign keys in the foreign keys tab, will these automatically correspond to the correct table or do I have to add
these to the columns also? (see picture 2)And my probably noobiest question at all… Do I put the foreign keys of
phonenumber
and
linking to a person_Id or should I put phoneNumber and email Id's in
the person table?
(My apologies for the language in the pictures not being English, I hope this isn't too much of a problem)
Best Answer
Responses to your individual questions
If —as stated in your specifications—
a Person can be a Member of only one Club
, and you are interested in storing this datum simply as true or false, one option is adding aBIT(1)
or aBOOLEAN
(TINYINT
) column to thePerson
table, and you may wish to call this columnIsClubOwner
. In this way, you can register the fact that a determined person is a club owner exclusively one time. Apart from that, this method also allows the possibility of retaining several people as owners of the same club occurrence. You can see a logical level depiction of this approach in Figure 1.However, you are looking for the best approach and, according to my experience, such approach entails the development of a much more expandable and versatile structure. In this respect, follow the progression of a modeling exercise for these and other points below, in the sections entitled “Covering your remaining specifications”, “Person as Member of multiple Clubs” and “Member and Owner as separate entity types”.
If you face an explicit requirement that indicates the definition of a table with a column of such characteristics, and that column has a valid contextual meaning or serves a particular purpose like being a surrogate for a wide natural PRIMARY KEY (PK), then yes, you should proceed that way.
Otherwise, if you do not have said requirement, I consider it would be unnecesary since you have to store and manage a meaningless extra column and (perhaps?) also an additional INDEX in your database.
As usual, you have to analyze each case along with its overall repercussions in order to decide how to carry on.
In this regard, my advice for you is to create your database structure manually, to code your own
DDL
statements until you get a firm grasp of the subject. If you do so, it will be easier for you to understand the processes that graphical tools are performing “under the hood”.To me, for instance, creating a statement like the following:
Is much more instructive than using GUI tools to execute this kind of tasks, especially now that you are building your first designs.
Personally, I think that this and all your other questions are completely valid and well contextualized.
Returning to the technical aspects that concern us, this precise inquiry offers a good opportunity to review the two following assertions:
A Person can be reached through zero-one-or-many PhoneNumbers
A PhoneNumber can be used to reach one-to-many People
So, one can conclude that there is a many-to-many relationship between
Person
andPhoneNumber
, therefore, I suggest the creation of an associative table namedPersonPhoneNumber
to represent said relationship in your database. The PK of this table should be composed by two different columns:PersonId
(a FOREIGN KEY [FK] pointing toPerson.PersonId
) andPhoneNumber
(a FK that makes reference toPhoneNumber.Number
). For a logical level description of all of the above, see Figure 1.On the other hand, let us examine the two propositions that follow:
A Person can be contacted via zero-one-or-many EmailAddresses
An EmailAddress can be used to contact exactly one Person
Then, yes, you should set a FK referencing to
Person
from theEmailAddress
table, and this table should have a composite PK too, which would be comprised of the columnsPersonId
andAddress
. In this manner, you can ensure that the same combination ofEmailAddress.PersonId
andEmailAddress.Address
can be inserted only once.If you also wish to ensure that a given
EmailAddres.Address
can be stored in one sole row, you just have to establish a UNIQUE CONSTRAINT for this column.Proposed logical data models
In order to expose my suggestions more clearly, I included four distinct IDEF1X[1] logical models that are shown in Figure 1, Figure 2, Figure 3 and Figure 4. I will provide an explanation of the most relevant features displayed in each one of them in the corresponding sections. You can as well download from Dropbox a PDF that integrates in a single model the majority of the elements under discussion.
Covering your remaining specifications
Relating People and Addresses
Let us inspect the two following (slightly reworded) assertions that are relevant to people and addresses:
A Person can only have one Address
An Address can belong to different People (couples or siblings)
So, to deal with these restrictions, you may opt to implement a data model similar to the one you can see in Figure 1.
Taking into account the referred model, you should follow the next steps:
Create a table called
PersonAddress
fixing a relationship betweenPerson
andAddress
. Set the columnsPersonId
andAddressId
as the compound PK of this table.Configure a UNIQUE CONSTRAINT for the
PersonAddress.PersonId
column to ensure that a specific value can be inserted in at most one row of said table. At the logical level, this circumstance implies thatPersonAddress.PersonId
has become an ALTERNATE KEY[2].If the value of
AddressId
in a determinedPersonAddress
insertion attempt has not being already stored, then let the insertion continue, otherwise, when such value already exists in a row, you have to check that (a) thePersonId
who has registered thatAddressId
is also registered as theMarriage.WifeId
if thePersonId
is a male (datum derived via thePerson.GenreCode
) or (b) that thePersonId
is theMarriage.HusbandId
when thePersonId
is a female (derived by virtue ofPerson.GenreCode
, as well). If one of these conditions is met in the appropiate situation, then you should let the INSERT go on.If the above conditions were not fulfilled, there is still a chance for a
PersonAddress
insertion try to be succesful. You have to check that thePersonId
value that is involved in said insertion try shares at least oneProgeny.ParentId
with thePersonId
that has already registered thePersonAddress.AddressId
. If this condition is satisfied, then it means that they are stored asSiblings
in the database, so this INSERT must succeed.As in any relational database implementation, you should seriously consider executing your
DML
operations inside ACID Transactions so that you can protect the integrity and consistency of the data you are working with.Attending the requirements you added in comments: Addresses and Phone Numbers serving equally Clubs and People
On condition that you want to give the chance for addresses and phone numbers to serve both people and clubs, you might make use of a supertype-subtype relationship. Here is an answer in which I give a more detailed treatment to this kind of structures, in case you are interested.
In the present scenario, you could define
Person
andClub
as subtypes of a new entity namedParty
, a term commonly used in legal circles to stand for (a) a person or (b) a group of persons (as noted in sense no. 6). With this method, the relationships betweenAddresses
(orPhoneNumbers
) andPeople
andClubs
would be defined throughParty
, the supertype. See Figure 2 for a depiction of this suggestion.Party and Address
So we can read in this new model that:
A Party keeps zero-one-or-many Addresses
An Address is kept by one-to-many Parties
Thus, there is a many-to-many relationship involving
Party
andAddress
that is expressed by way of thePartyAddress
entity.Party and PhoneNumber
Furthermore, we can interpret that:
A Party is reached through zero-one-or-many PhoneNumbers
A PhoneNumber is used by one-to-many Parties
That is why I added the
PartyPhoneNumber
entity that describes the many-to-many association that takes effect between theParty
andPhoneNumber
entity types.Party and Club or Person
Then, it can also be read that:
A Party is either a Club or a Person
Hence,
Party
supplies a connection from eitherClubs
orPeople
toAddresses
(orPhoneNumbers
).Person as Member of multiple Clubs
As @aldwinaldwin mentiones in his answer, if you want to provide the functionality for a person to be a member of multiple clubs, then you can include a table called
ClubMember
that would be acting as another many-to-many relaltionship, this time, naturally, interconnectingPerson
andClub
.I am going to add to the above that this table can also be useful in the goal of storing any concrete person as owner of multiple clubs, by means of the inclusion of the already mentioned
IsClubOwner
boolean column. In fact, one can say that this new table is the representation of an integral entity type in its own right.As demonstrated in Figure 3, such table requires a composite PK made up of the columns
ClubId
andMemberId
(a role name[3] given toPersonId
), and these columns have to be defined also as FKs pointing, correspondingly, toClub
andPerson
.A more adaptable structure
Employing this setting you can also comply with the initial rule that states that
a Person can be a member of only one Club
, but you just have to add a UNIQUE CONSTRAINT to theMemberId
column, so that a certain value can be entered in no more than one occasion. So, as you can note, this structure is much more adaptable that the one shown in Figure 1, since by dropping the UNIQUE CONSTRAINT (or INDEX) you would open the aforementioned functionality of permitting a person to become a member of different clubs at the same time.Member and Owner as separate entity types
As you know, the storage several facts about the role carried out by a person as owner of a club —besides its mere existance in a true or false attribute— can be very advantageous. For example, you may want to keep the effective date in which a definite person became the owner of a club, hence you can manage this situation by introducing a separate entity type called
ClubOwner
, as presented in Figure 4.Once you construct a table based on this new entity type, you can add the fitting columns that represent the characteristics that come into play exclusively when a
Person
is theOwner
of aClub
. As depicted, this table would hold a PK composed of the FK columns referencingPerson.PersonId
andClub.ClubId
, this way any combination ofClubOwner.OwnerId
(orClubOwner.PersonId
, if you prefer) andClubOwner.ClubId
can be inserted in just one opportunity.Of course, with this configuration you can still derive in boolean form if a
Person
is theOwner
of a particularClub
with the aid of a query that returns a scalar value which can be evaluated to true or false.Notes
1. Integration Definition for Information Modeling (IDEF1X) is a highly recommendable data modeling technique that was defined as a standard in december 1993 by the United States National Institute of Standards and Technology (NIST). It is solidly based on (a) some of the theoretical papers authored by the originator of the Relational Model, i.e., Dr. E. F. Codd; on (b) the Entity-Relationship theory, developed by Dr. P. P. Chen; and also on (c) the Logical Database Design Technique, created by Robert G. Brown. It is worth noting that IDEF1X was formalized by way of first-order logic.
2. An ALTERNATE KEY is an attribute (or a combination of attributes) that holds values that uniquely identify an entity occurrence but was not chosen as the PK of the pertinent entity type; each entity type can have zero, one or more ALTERNATE KEYS. In an IDEF1X model, they are are indicated as “AK” plus its respective number, e.g., AK1, AK2, etc. They are usually implemented in a SQL DDL structure via a UNIQUE CONSTRAINT (or a UNIQUE INDEX).
3. Role names are denotations (or aliases) assigned to FK attributes in order to express the meaning that they hold within the scope of their respective entity. Their usage is recommended since 1970 by Dr. Codd in his seminal paper entitled “A Relational Model of Data for Large Shared Data Banks”. For its part, IDEF1X —keeping fidelity regards relational practices— also advocates role naming.