Relational Database design best practice

database-design

I am a student and I am trying to design a simple database, so if the ideas seem bad please let me know how to improve.

I have three tables in my mind, one for Professors and one for Students. Both of them have Contacts details. I need to know what is the best practice to achieve a relationship between the three.

I see two possibilities:
A) Having both Professors and Students tables use an auto-increment field and a supplementary default initialised field (let's say with 1 for Professors and 2 for Students). Then create a key in the third table (Contacts) as a combination of the unique ID and default initialized fields.

B) Having for tables instead of three, with two identical structure tables as ContactStudents and ContactProfessors, each serving for the appropriate table.

From my perspective I would choose the first one since if I would want to add new fields to the Contact table both would benefit from it.

Which one do you think is best and why? Is there another, better way, of achieving this?

EDIT:

After that variety of answers and at zagrimsan's suggestion I add the following requirements:
Student can be also a Professor.
A person (Student of Professor) can have multiple contacts.
A Contact can be reused for multiple Students and Professors (same kind) but not both Student and Professor at same time except the case when the same Student is also a Professor.

Best Answer

Problems involving classes and subclasses (or, if you prefer, types and subtypes) come up all the time in database design. Your situation looks like a case in point.

In your case, here's what I would do, using a technique called Class Table Inheritance combined with a technique called Shared Primary key.

Create three tables: Contacts, Professors, and Students. Use the autonumber feature for the ID field of the Contacts table. Have an ID field in the other two tables, but do not use the autonumber feature. Instead, whenever you insert a new Contact, obtain the ID field just generated for the new contact, and provide that ID field as the value for ID in either Professors or Students. The ID field is declared as the primary key in all three tables. You can even create a Contact that is both a student and a professor, if that makes sense in your situation.

You place other attributes in the appropriate table, according to the attribute, as you have suggested.

ID is guaranteed to be unique in all three tables. If you want data about only students, join Contacts and Students matching on the ID field. Likewise for data about professors.

Using shared primary key in this way enforces the 1:1 nature of the relationships between Students or Professors and Contacts.

Good luck!

Related Solutions

Field types database design best practice

Not discounting @gbn's suggestion (in a comment on the question) to possibly store this data somewhere other than in a RDBMS, I will say that if you do decide to go the RDBMS route, you are better off using a single table with a column per each "type" such that they can be strongly-typed. Or, you could use a single table with a single string field as all data can be converted/serialized to a string, but then you have to be careful to do the conversion properly in both directions for all operations, and that might prove problematic if some uses of that table either incorrectly convert or forget to convert altogether.

The reason for choosing a single table is mostly logistics: it is far easier to dynamically select a different column based on a COALESCE or CASE (or CASE-like) clause than it is to dynamically pull from different tables in a query based on the TypeID of the row (at least not without having to LEFT JOIN all "type" tables in all queries (yikes!).

Depending on which RDBMS you are using, there might even be vendor-specific features to assist in this model. For example, SQL Server, starting in version 2008, I believe, has a SPARSE option that can be added to NULLable columns. This allows for a NULL value to take up 0 bytes whereas normally, for fixed-length fields (i.e. pretty much all of them that aren't [N]VAR{something} or {something} VARYING or XML or generically text or SQL_VARIANT), they would always take up their specified number of bytes. The downside is that for non-NULL values, they now take up 4 extra bytes per row. But it can be a huge savings when the column is 75% or more NULL across all rows. This feature was designed specifically to address this type of model.

Just make sure to have a CHECK CONSTRAINT to ensure that for all of the value_ fields, either none, or only one, of them is NOT NULL.

Then you can access via something like:

COALESCE(value_int, value_string, value_datetime, etc.)

And again, if you are using SQL Server (and maybe one or more other vendors have this feature), you can make use of "filtered indexes" to index each specific value_ field where that value is not null:

CREATE NONCLUSTERED INDEX [IX_FieldValues_ValueInt]
  ON dbo.[FieldValues] ([ValueInt])
  WHERE [ValueInt] IS NOT NULL;

This will allow you to target rows that do have values, although probably not when using the generic COALESCE access method, but when targeting a specific Value% field.

AND, looking again at the two options, I would actually go with Option 3, which is a single table. In Option 2 it does not seem like there is much purpose in breaking field_type_id out as that data is not really a parent to the field_value data. A field_id can only have a single value, right? So just put field_type_id in the field_value table. And make use of it in the CHECK CONSTRAINT that ensures a single field at most is NOT NULL, to ensure that the one field, if any, that is NOT NULL matches the field_type_id.

Database Design – Which Design to Choose

Both solutions are viable and each has different benefits / drawbacks - the best solution depends on your data and how many of your queries need this additional column.

Adding the special_class_id to the students table makes this table "wider", meaning that the performance of all queries against this table will be slightly impacted (even if they don't care about this field). A single INT won't make much difference, but it could start to be an issue if the special students need more columns in the future.
Adding a separate table keeps the students table clean however at the cost of an additional join, which is likely to be slower whenever you want to retrieve that additional column.

The key is to create views on top of your tables - this allows you to change your tables later should it become necessary. The views you create should tie in with your application requirements, however as an example all of the following views are really easy to build on top of either schema.

Your application will almost certainly need a list of all students - this view should probably contain only columns for "regular" students (unless your application displays some sort of indicator as to whether or not a student is "special", in this case this view could include that indicator)
If your application has a separate listing of special students then you can create a separate view which shows only those students with any additional columns required.
Additionally, should you need to look up the special fields for a specific student you could create an extra view consisting of just the student id and special fields.

This allows you to pick the simplest solution now, and change your tables later should the performance be an issue.

FWIW my gut feel is that option 1 will offer better performance as it avoids a join.

Best Answer

Related Solutions

Field types database design best practice

Database Design – Which Design to Choose

Related Question