Similar data with fields redundant in one table

denormalizationnormalization

I need four different plans to be stored in one or more tables. All four plans share the fields however some plans have extra fields.

All plans fit into the plans-table:

PLANS(id, provider_id, plan_type, name, desc, price, link)

However based on the plan type it requires more fields to be added and/or some fields will be empty.

As an example it may be that if plan type A is chosen there are no empty fields but if plan type B is chosen there will a requirement of two additional fields like:

PLANS(id, provider_id, plan_type, name, desc, price, link, strength, acceptable)

In other cases plan C requires all fields except strength field.

Leaving empty fields means it's not fully normalised, but my reasons for leaving it like so is because I want to use one QUERY to find all the different types of plans for a provider using a GROUP BY with a server side script.

I could use an additional table to store the additional fields with a 1:1 relationship or have 4 separate tables?

If I do decide to have 3 additional tables (with only the extra fields for each plan) then I could just do a JOIN when needed.

Here's what it could possibly look like (I've omitted some fields):

enter image description here

Is this a good approach?

Best Answer

A good way to do this is to use the Entity-Attribute-Value approach when modelling sparse attributes (https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model).

For you the relationships would probably look like this:

Relationships

Your tables would then look something like:

Plan table

Plan Field

Plan Field Value

Note I haven't specified the PlanTemplate table, this would show all possible fields for a plan.

There are a few caveats with this approach:

  1. It requires that you almost always pivot and join the PlanFieldValue table, performance can be hampered. Usually views are necessary, there is a dependency here on what DBMS you are using. Almost all support pivoting, but there are varying levels of support and performance
  2. Querying and displaying data from the PlanFieldValue on to your UI is onerous, requires using a view and updates/inserts need to be handled differently. More dev time required.
  3. Indexing and performance of the tables is often hampered if millions of rows are in the table, for instance
  4. All attribute information has to be checked manually, for instance field length and nullability needs to be checked manually against the template, possibly using another column in PlanField that has a Nullable flag.

Otherwise its a good approach that I have used quite a few times. One big advantage is you can add new fields without changes to your database structure - a huge bonus if the field values are changeable or likely to be increased.

However if the number of fields for each Plan is low and it is unlikely you will add more fields, a single plan table with individual columns still isn't too bad. It doesn't really matter having some columns blank but it does make it hard to track nullability in the database.