Storing data from different sources in a single column

database-design

I want to ask a general question. We are getting data from different stores like magento, woocommerce, amazon aws, shoppify and many others. Data like customers info, orders info, products info is being fetched. All the stores are returning data that is bit same, bit different. Here are two database designs that we are proposing:

  1. Use different columns for every store like magento_customers, shoppify_customers but this do not seems right.
  2. Make a single column for customers from all sources. fill the fields that has data and make others null, like magento returns group_id, but woocommerce do not, so fill group_id by '1' lets suppose and when it's null, keep it null.
    I am a junior developer, and this is my first commercial project so I'm little hesitant.

Best Answer

It sounds like magento customers, shoppify customers, etc. are subclasses (subtypes) of the customer class. This is the terminology in object modeling.

In ER modeling, the terminology is usually generalization/specialization.

There are two techniques often used when designing tables to fit this situation.

One is single table inheritance. Store all the customers in one table, with a customer_type field, and all the fields for all the types. Leave fields NULL when they don't apply to a particular customer.

Another is class table inheritance. Have one table for customers with all the common data, and only that data. Have separate tables for each subclass (subtype) of customer, with data that is specialized to that subtype. In the subclass tables, don't generate a distinct id for the rows, because you don't need to. Use a copy of the customer id as both the PK for the specialized table and an FK to join with the correct row in the generalized table.

Joining specialized data and generalized data is pretty fast, and fairly simple.

With either technique, the problem you originally posed goes away. You can make an FK reference to the customer table, and access both the generalized data and the specialized data.

If you search on all the buzzwords I've given in here, you'll get lots of useful articles on the web that treat the subject in more depth. You can also explore the tag right here.