A simple question regarding database design that has always bothered me

database-design

Say I have a table of schools, and each school is either Public, Private, or For-Profit. As well, each school is either '2-4 Year,' '4+ Year', or 'Graduate'. My initial instinct in this situation is to create another table that holds attributes, then create a junction (many-to-many) table to map a school to its attributes. This seems simple enough, but every time I am faced with this situation my stomach begins to turn and the following two questions always arise:

  1. Is it worth it? What reason really is there not to just store the text (Public, Private, For-Profit) in a column in the school table?

  2. The other issue is in naming. Say the first table is named school, the attribute table is named school_attribute (assume there are also attributes that apply to other things, e.g., companies), then I am stuck naming the junction table school_school_attribute, and that just sounds terrible. The naming problem often leads me to not normalize my table, even though I know that's the wrong thing to do. However, I'd almost rather be stuck with a database that is not 3NF than redundant naming in my tables.

So, first I'd like to know if it is even worth it, and for what reason (e.g., it saves space); second, I'd like to know how people name their tables to avoid redundancy.

EDIT

Another option would be to create a table called attribute with a column named type. The type column would tell me what type of attribute it is (e.g., school, company, etc). Wait, no reason to store the text in the column, let me create a table called attribute_type, and now a junction table named attribute_attribute_type… Aggg! This is my biggest block in database design for some reason.

Best Answer

There really is a whole raft of pros to normalizing your database. If the main issue you're having is naming, then you're in pretty good shape. Apart from space which you've already mentioned:

  • The width of your table will be smaller, this will increase the speed of your selects.
  • Updates and inserts will be quicker because you're inserting/updating integers instead of text
  • Maintaining your data will be easier ie if there was a new type of school which needed to be added, you should just be able to add it to the table.
  • You can create hierarchies in your lookup tables
  • You don't need to reply on your application layer to enforce referential integrity.
  • You can manage changes to your lookup options over time (ie what is "Public" is no longer applicable after July 1st but older records created before this date must still show this as an option?)
  • Loading data into a data warehouse is simplified
  • It makes it easier to other DBAs to understand your schema

Normalization is usually going to have a level of subjectivity. You database schema should reflect a balance between maintainability, complexity and performance. On the flip side, if you're only going to have 1 or 2 tables and a small amount of records, maybe normalization is overkill? I would tend to normalize anyway, because if the database needs to scale up, you're already in a good position to do so.

As for naming, have a think about the most sensible name that will describe the table's contents without it sounding crazy like the example you gave above. Otherwise you'll end up with a schema which is hard to understand.

Hope this helps.

Related Question