Database Design – Which Design to Choose

database-design

Suppose I have a website that communicates with a database.

In some page of my site i want to exhibit a list with all the students in my database. There are two kinds of students: regular students, that only participate in regular classes (with attributes such as student_id, name, email, password) and special class students, students who participate not only in regular class but also in special extra classes (with all the attributes the regular students have plus an attribute for the special_class_id).

In the exhibition of the students on the mentioned page, I want it to exhibit the student differently depending on whether it's a special class student or not.

So i'm thinking here what would be the best database design in order for the code for the exhibition of the students to have the best efficiency/performance.

Option 1

Two tables on my database, one for all students (with student_id,name,email,password) and one extra for special class students (with student_id and a special_class_id corresponding to the id of the special class the student participates). Let's name the table with all students by students and the other one by special_students.

With this option, for the exhibition page, the code would pick each row of the students table and display it. It could then check whether that student_id was inside any row of the special_students table, and if it was, it could exhibit that student differently.

Option 2

Another option would be having only one table on my database with an extra special_class_id column. Then, for regular students this column would have some default value (maybe 0), and for special class students, this column would have value identifying the special_class_id.
With this option for the exhibition page, it seems to me that the code would be faster since it would query only one table .. but on the other hand, adding this extra column seems a bit sloppy.

Would there be a significant performance advantage on the second option, over the first one, considering an hypothetical situation where the site might have 10 million visitors a day? Can this sloppy extra-column interfere in some way and cause problems? Is there any other option for the database design?

Thanks a bunch in advance.

Best Answer

Both solutions are viable and each has different benefits / drawbacks - the best solution depends on your data and how many of your queries need this additional column.

  1. Adding the special_class_id to the students table makes this table "wider", meaning that the performance of all queries against this table will be slightly impacted (even if they don't care about this field). A single INT won't make much difference, but it could start to be an issue if the special students need more columns in the future.
  2. Adding a separate table keeps the students table clean however at the cost of an additional join, which is likely to be slower whenever you want to retrieve that additional column.

The key is to create views on top of your tables - this allows you to change your tables later should it become necessary. The views you create should tie in with your application requirements, however as an example all of the following views are really easy to build on top of either schema.

  • Your application will almost certainly need a list of all students - this view should probably contain only columns for "regular" students (unless your application displays some sort of indicator as to whether or not a student is "special", in this case this view could include that indicator)
  • If your application has a separate listing of special students then you can create a separate view which shows only those students with any additional columns required.
  • Additionally, should you need to look up the special fields for a specific student you could create an extra view consisting of just the student id and special fields.

This allows you to pick the simplest solution now, and change your tables later should the performance be an issue.

FWIW my gut feel is that option 1 will offer better performance as it avoids a join.