PostgreSQL – Database Design for Varying Relationships

database-designperformancepostgresqlpostgresql-9.4postgresql-performance

I am currently trying to design a database model to store all the possible nutrition data, but for many products there isn't as much data available as for the others, as a result many field could end up as NULL values (I do not want to use any placeholder/default values as I will need to be able to separate which ones are actually missing)

I found that PostgreSql might be a good solution for me, as I do not want to build a EAV (Entity-Attribute-Value) tables, because I have had my experience with it, it doesn't scale very well. (If "products" could have anywhere from 0 to 200 attributes, it would quickly get into tens of millions rows).

PostgreSql has composite types and you can also store them as an array, which allows me to store key value pairs easily within a single column, as a result it is easy to lookup everything related to the "product", I can easily do a join between tables and have the data available.

Now the question is:

Is it a good solution, what kind of obstacles could I face, is there a better solution to my problem?

Would having separate tables for example:

CREATE TABLE products (
    proteins_id FOREIGN KEY,
    lipids_id FOREIGN KEY,
    minerals_id FOREIGN KEY,
    vitamins_id FOREIGN KEY)

each foreign key referencing a table with possibly up to 50 null values in a row

OR

CREATE TYPE custom_type (
   field_1 INTEGER,
   field_2 INTEGER
)

CREATE TABLE products (
    proteins custom_type[],
    lipids custom_type[],
    minerals custom_type[],
    vitamins custom_type[]
)

I understand that the optimal solution might be somewhere in-between those two solution by finding a balance of some sort, but I really need some pointers on that, thank you guys! 🙂

EDIT: Plus another things to consider: The tables might need to be expanded quite often and it would be great if the process were least painful.

I also face the same kind of problem with other tables such as (food) "recipes" as the number of ingredients can be very varying plus I need to store the measures/quantity of the components.

Best Answer

What you are describing is called a subtype. A subtype is a common data structure used for things that are very similar but have some differences. For example, and object Named Clothing can be used to describe many types of personal coverings but T-Shirts and Shorts have very different properties. With a Subtype, you store all the common properties in one table and have a related table of the properties that are different. That main table will have a FK to a Type table that will denote which of the subtype tables this object is.

Entitty: Clothing ClothingID ClothingTypeID -> ClothingType Name Size Color

Type Entitty: ClothingType ClothingTypeID Name

Now for the subtype tables. You will have one for each value in the Type table.

Entity: TShirt ClothingID (Subtypes always have the same primary key value as the parent object) SleeveLength ColorType

Entity: Shorts ClothingID HasPockets LegLength IsSwimSuit

Related Question