Storing data from different sources in a single column

database-design

I want to ask a general question. We are getting data from different stores like magento, woocommerce, amazon aws, shoppify and many others. Data like customers info, orders info, products info is being fetched. All the stores are returning data that is bit same, bit different. Here are two database designs that we are proposing:

Use different columns for every store like magento_customers, shoppify_customers but this do not seems right.
Make a single column for customers from all sources. fill the fields that has data and make others null, like magento returns group_id, but woocommerce do not, so fill group_id by '1' lets suppose and when it's null, keep it null.
I am a junior developer, and this is my first commercial project so I'm little hesitant.

Best Answer

It sounds like magento customers, shoppify customers, etc. are subclasses (subtypes) of the customer class. This is the terminology in object modeling.

In ER modeling, the terminology is usually generalization/specialization.

There are two techniques often used when designing tables to fit this situation.

One is single table inheritance. Store all the customers in one table, with a customer_type field, and all the fields for all the types. Leave fields NULL when they don't apply to a particular customer.

Another is class table inheritance. Have one table for customers with all the common data, and only that data. Have separate tables for each subclass (subtype) of customer, with data that is specialized to that subtype. In the subclass tables, don't generate a distinct id for the rows, because you don't need to. Use a copy of the customer id as both the PK for the specialized table and an FK to join with the correct row in the generalized table.

Joining specialized data and generalized data is pretty fast, and fairly simple.

With either technique, the problem you originally posed goes away. You can make an FK reference to the customer table, and access both the generalized data and the specialized data.

If you search on all the buzzwords I've given in here, you'll get lots of useful articles on the web that treat the subject in more depth. You can also explore the subtypes tag right here.

Related Solutions

Importing order information

I think the best way to approach this would be to have tables like so:

table Customer
    ID
    Name
    ...

table CustomerFields
    CustomerID
    FieldID
    FieldName
    ... (could have things like type, required/optional, size, etc.)

table CustomerOrders
    CustomerID
    OrderID
    OrderNumber
    DeliveryDate
    ... (all your generics and mappable fields)

table CustomerOrderFields
    CustomerID
    OrderID
    FieldID
    FieldValue

By having the table CustomerOrders, you satisfy the ability to map common customer order data to common fields (simplifying reporting at the expense of making the import a little more painful since the fields must be mapped), and the CustomerOrderFields gives you the ability to have the custom fields per customer necessary for the un-mappable data.

The custom fields are still reportable, but not as easily as your generics as they'll come to you in multiple rows (instead of multiple columns). There are some ways around all that depending on your report creator (e.g., pivoting the results).

The only other option would be to do something like this (which, personally, I would avoid):

table CustomerFields
    CustomerID
    Field1Name
    Field2Name
    Field3Name
    ...
    Field99Name

Table CustomerOrders
    (all your generics)
    Field1Value
    Field2Value
    ...
    Field99Value

This has the advantage that all your data is in one row, but also has the disadvantage that it isn't immediately apparent from the row what each value means. The first method can always be joined to the field list to give a good definition of the data in each custom field. In addition, what happens if some company requires 100 custom fields? In the above example, you'd be making changes to your data structures and code whereas in the first example, you'd never have the issue -- customers could have infinite custom fields.

I've seen it done both ways, and both ways work. Both ways have their downsides and upsides. The first is far more scalable, but harder to get in to columns (instead of rows). Everything's a trade-off.

Hope that helps some!

Tables for different sources of payments

I normally go with modified version of (2).

payment_type (payment_type_id, PK(payment_type_id) );
payments(payment_id , payment_type_id, [other_attrubutes],
  PK(payment_id), UNIQUE(payment_id,payment_type_id), FK(payment_type_id) 
);

paypal_payment(payment_id , payment_type_id, [paypal_attributes], 
 PK(payment_id), FK(payment_id , 
payment_type_id)   REFERENCES payments(payment_type_id,payment_type), 
 CHECK  (payment_type_id='paypal')
)
manual_payment(payment_id , payment_type, [manual_attributes], 
 PK(payment_id), FK(payment_id , 
payment_type_id)   REFERENCES payments(payment_type_id,payment_type), 
 CHECK  (payment_type_id='manual')
)

Check constraints on each detail table ensures that manual table details never go into paypal table, and vice versa. Unique constraint allows detail tables to have a reference to 2 columns.

(3) is also not so bad, but I don't like exclusive arcs - adding new payment type will require changes to payments, and trigger used to enforce that only one of columns is not null.

Best Answer

Related Solutions

Importing order information

Tables for different sources of payments

Related Question