Importing order information

database-design

Our new project is to write a customer order management system. My specific task is to import customer orders from various different sources – XML, CSV, XLS… etc.

Our old DBA has left the company and I have stepped up to the plate. I have plenty of SQL experience but for the first time I'm required to design a brand new system with a feasible data structure to hold the orders of many different customers, which we import into the database. I don't want to have to change the database every time we have a new customer order with a slightly different structure.

The problem is that every customer wants to provide us with an output from their computer system, and every customer's orders are different. Our plan is to write a script for each unique customer to take their data file and transform it into our orders table.

So far so good, but what will my customer orders table look like? Every customer is going to have their own unique fields, some will be mappable to generic fields (like order number, delivery date) for example.. but if I'm to build a solid foundation, I need ideas to store this data in a flexible and scalable way.

Does anyone have any experience with this kind of thing? Any ideas or tips would be very much appreciated. Thanks for your time.

Best Answer

I think the best way to approach this would be to have tables like so:

table Customer
    ID
    Name
    ...

table CustomerFields
    CustomerID
    FieldID
    FieldName
    ... (could have things like type, required/optional, size, etc.)

table CustomerOrders
    CustomerID
    OrderID
    OrderNumber
    DeliveryDate
    ... (all your generics and mappable fields)

table CustomerOrderFields
    CustomerID
    OrderID
    FieldID
    FieldValue

By having the table CustomerOrders, you satisfy the ability to map common customer order data to common fields (simplifying reporting at the expense of making the import a little more painful since the fields must be mapped), and the CustomerOrderFields gives you the ability to have the custom fields per customer necessary for the un-mappable data.

The custom fields are still reportable, but not as easily as your generics as they'll come to you in multiple rows (instead of multiple columns). There are some ways around all that depending on your report creator (e.g., pivoting the results).

The only other option would be to do something like this (which, personally, I would avoid):

table CustomerFields
    CustomerID
    Field1Name
    Field2Name
    Field3Name
    ...
    Field99Name

Table CustomerOrders
    (all your generics)
    Field1Value
    Field2Value
    ...
    Field99Value

This has the advantage that all your data is in one row, but also has the disadvantage that it isn't immediately apparent from the row what each value means. The first method can always be joined to the field list to give a good definition of the data in each custom field. In addition, what happens if some company requires 100 custom fields? In the above example, you'd be making changes to your data structures and code whereas in the first example, you'd never have the issue -- customers could have infinite custom fields.

I've seen it done both ways, and both ways work. Both ways have their downsides and upsides. The first is far more scalable, but harder to get in to columns (instead of rows). Everything's a trade-off.

Hope that helps some!