Mysql – Designing database with Category and Tags for Posts – Seeking for guidance/advice

MySQLschema

I'm not really a database novice but I'm not an expert either.

Problem:

I'm having this difficulty on designing my Post, Category and Tags relationship.

I've read Normalization many times but could not fully digest them and use it as a guide to what I'm trying to achieve. I've also encountered EAV, read about it and likewise. Or maybe I just can't choose which to use for this scenario.

I have few solutions in mind but I could not settle on one. I'm craving for the "BEST" choice. I'm aiming to be able to understand and design good databases!

Scenario

I have a Post table, like in any other website like wordpress with post capability.

I want that post to be associated with 1 or more Categories and Tags.

I want the category to have a hierarchy where categories like Cats and Dogs are under Animals category.

Option 1

Categories
    id -- PK
    pid -- FK to self (id), allow NULL
    name
    description
    slug
Tags
    id -- PK
    name
    description
    slug
Cat_Rel
    id -- PK
    pid -- FK: Post Id
    cid -- FK: Category Id
Tag_Rel
    id -- PK
    pid -- FK: Post Id
    tid -- FK: Tag Id

Notice how those Categories/Tags and their relationship with Post are very similar?

I have this urge that I think I could just combined them which leads to my problems.

Options 2

PostGroup -- I don't know how to call this currently but this should work for now
    id -- PK
    pid -- FK: category use only so NULL for all tags
    name
    description
    slug
    type -- and we add a type where we could explicitly tell if it's a category or a tag
Group_Rel -- Now we only have 1 relationship table
    id -- PK
    pid -- FK: Post Id
    gid -- FK: Group Id

So, which one should I follow? Or none of those? Other options?

On your reply, can you include DO's and DON'T, PROS and CONS.

Anything that you think I need to know, understand and realize.

Best Answer

One way to understand what's going on with normalization is to find out what normalization buys you. In other words, what's going to happen if you depart from any given normal form?

For normalization forms second through fifth normal forms, the answer is "update anomalies". That is, when you insert, update, or delete rows, can you end up leaving things in a bad way? One particular kind of bad way is a database that contradicts itself, that stores a given assertion somewhere in the database and stores a contradictory assertion somewhere else in the database.

Normalization helps with this by obviating the problem. That is, it cuts down or eliminates the possibility of contradictory assertions by assuring that facts are stored once, and only once. The fact is either right or it's wrong, but it's not self contradictory.

There are other ways to avoid self contradictory databases, but they involve careful, disciplined programming. There are situations where a non normalized approach gives you more benefits than a normalized approach. But this calls for judgement, and judgement comes with experience.

With regard to EAV, one attraction to EAV is that it allows you to get started without coming up with a logical model for the structure of the data. The reason is that any logical model whatsoever can conform to EAV. So why doesn't everybody do EAV? Because sometimes, the data that results is relatively worthless compared to the expected value when the database was built. I'm not saying EAV is always wrong. I am saying that it often is. Again, this calls for judgement, and judgement comes with experience.