An Orders
table I'm dealing with at present in design phase has about 10 or more columns.
Each OrderID
in the Orders
table will have certain significant events or milestones. The number of Orders
could go into the millions. At the moment, I'm aware of about 20 important events to track for each Order
, but that number may increase or decrease.
Of these 20 or so events, only a handful will always take place, while the rest are just potential events. Also, many – but not all – of these events have at least two points which need storage:
- the Date the event is supposed to happen (if it is no longer just a "potential" event)
- the Date the event did happen
The events data will be updated in the database from external sources in the form of .xls
or .csv
files. For all events, it is only possible for the given event to happen once.
What would be the best design for storing this milestone event data?
Would you try to use a somewhat normalized design, where there is an Events
table designed similar to this…
CREATE TABLE dbo.Events (
EventID INT
, OrderID INT
, EventExpectedDate DATEIME2(7)
, EventActualDate DATETIME2(7)
, EventTypeID INT
, EventSkip BIT
...
)
Or would you try a less normalized approach and flatten all the possible events out into one very wide events table that mimics what the .xls/.csv
update files will look like?
CREATE TABLE dbo.Events (
EventID INT
, OrderID INT
, Event_1_ExpDate DATEIME2(7) /* No, the actual names would not include 1, 2, 3 */
, Event_1_ActDate DATETIME2(7) /* The actual names would name type of Event */
, Event_2_ExpDate DATETIME2(7)
, Event_2_ActDate DATETIME2(7)
...
, Event_20_ExpDate DATETIME2(7)
, Event_20_ActDate DATETIME2(7)
)
Or is there another pattern you would use?
Best Answer
Normalize always unless you have a very good reason not to. It should not be flipping a coin.
Neither of your proposed designs seems optimal to me for a variable / unknown number of events, since when "that number may increase or decrease," in both cases, you have to change the schema and the code.
I also don't believe that you have to base your core table design around the format of the input files. This is why we have staging tables, ETL processes, SSIS, C#, etc. There are many ways to transform data from flat files into a format that is more appropriate for a relational database. Don't design your database because of what your flat files look like.
So you could have staging tables that look like the input files, but then you would load the data into real tables that look perhaps like this:
Now when you have a new type of event, you just insert it into the
dbo.Events
table, and then you can insert related events intodbo.OrderEventLog
without having to change the schema of either table, and without having to change the interface to your stored procedures or bulk inserts into the staging tables either. This also keeps the history table (and all of its indexes) nice and narrow, and allows you to build indexes that favor searches for specific types of events.You can always flatten the data later - which is typically what you do when you get to a point where you are building a data warehouse around the data.