MySQL Normalization – Converting Table to 2NF Relation

MySQLnormalizationrelational-theory

I'm currently going through the process of learning about database normalization & having a crack at creating my first relational database. It was going well until I hit the following problem.

In a non-normalised form my justifications table would like like this:

| justification_id | user_id | question | response | action | category | category_id | submitted_at |
-----------------------------------------------------------------------------------------------------

My research & thought process led me to the following:

justification_category

| justification_id | category | category_id | 

justification_response

| justification_id | question | response | 

justification_action

| justification_id | user_id | action | submitted_at |

How does a justification work?

When a user performs an action on a category ( the current categories being tickets, orders, accounts ) a justification is required by the user to state why they are performing that action. I.e. account deletion

An action can have multiple questions attached resulting in multiple responses. For example: Deleting an account would ask the questions: Have you followed our internal steps? Why do you need to perform this action? etc. This is why I proposed the justification_response table because it will not store any NULL fields.

The justification_action table will log the action performed, the user_id ( this is the submitter of the action ) & the time of the action.

Questions

I currently have 3 potential categories where a justification is required on an action, they're tickets, orders & accounts. With this in mind would it be a better approach to have 3 separate tables to describe each category? Would this approach meet 2NF?

I feel I am far away from a 2NF relation. This is because I'm constantly re-using the justification_id as either a FK or a PK. How far away am I?

Best Answer

Kudos for trying to get it right, and understand the principles. I've been doing this for years and still find it difficult to label existing structures with precisely accurate relational theory terminology at times.

Normal forms, it's important to note, only describe individual relations (that is, "tables," in SQL), and not the relationships between them... so the frequent appearance of justification_id does not in any sense move your tables further from 2NF. Indeed, these are all tables of attributes about justifications, so it's only sensible that the justification would appear.

The justification_id seems to be a candidate key, or at least a prime attribute, in each table... isn't it? Candidate keys, pretty much by definition, can't have a denormalizing effect. It's the other attributes that must be analyzed.

I'm confused about category and category_id. Would "category" be (e.g.) "ticket" and "category_id" be (e.g.) "27"...? If so, your naming is a little counter-inituitive, and you're going off in a direction that normalization doesn't really address, but that may cause you difficulty later -- you're mixing data domains in a single attribute (column).

By that, I mean this: the value of "category_id" might be an id from the "ticket" table or an id from the "order" table, etc., depending on which type of category. If I am right about what these columns are being used for, then you don't want to do this, because (among other reasons) it is impossible for the database to correctly impose a foreign key constraint on this column. As a rule of thumb, a value in a column should mean the exact same thing regardless of what's contained in any other column. It might be more correct if there were junction tables, "ticket_has_justification" (with a ticket_id and justification_id), "order_has_justification" (order_id, justification_id), etc. You can still outer join all justifications to the various types of things they justify, but this is a cleaner approach, at least in my opinion.

The "questions" column also might be more appropriate in a "question" table, with justification_response having a question_id column referencing it, but here again, this isn't really addressed by normalization -- whether it's stored in this table as the words of the question, or as a surrogate for the words (question_id), the "question" column is still a prime attribute, with (justification_id, question) being the candidate key and response being dependent on both, so this one is at least 3NF, despite having what looks to me like room for improvement.

Fundamentally, though, I don't see what isn't 2NF about any of these... depending on exactly what "category" and "category_id" mean.