SQL Server 2008 – Table Design for Tracking Significant Events

best practicesdatabase-designsql-server-2008

An Orders table I'm dealing with at present in design phase has about 10 or more columns.

Each OrderID in the Orders table will have certain significant events or milestones. The number of Orders could go into the millions. At the moment, I'm aware of about 20 important events to track for each Order, but that number may increase or decrease.

Of these 20 or so events, only a handful will always take place, while the rest are just potential events. Also, many – but not all – of these events have at least two points which need storage:

  1. the Date the event is supposed to happen (if it is no longer just a "potential" event)
  2. the Date the event did happen

The events data will be updated in the database from external sources in the form of .xls or .csv files. For all events, it is only possible for the given event to happen once.

What would be the best design for storing this milestone event data?
Would you try to use a somewhat normalized design, where there is an Events table designed similar to this…

CREATE TABLE dbo.Events (
    EventID             INT
    , OrderID           INT
    , EventExpectedDate DATEIME2(7)
    , EventActualDate   DATETIME2(7)
    , EventTypeID       INT
    , EventSkip         BIT
    ...
    )

Or would you try a less normalized approach and flatten all the possible events out into one very wide events table that mimics what the .xls/.csv update files will look like?

CREATE TABLE dbo.Events (
    EventID             INT
    , OrderID           INT
    , Event_1_ExpDate   DATEIME2(7)  /* No, the actual names would not include 1, 2, 3 */
    , Event_1_ActDate   DATETIME2(7) /* The actual names would name type of Event */
    , Event_2_ExpDate   DATETIME2(7)
    , Event_2_ActDate   DATETIME2(7)
    ...
    , Event_20_ExpDate   DATETIME2(7)
    , Event_20_ActDate   DATETIME2(7)
    )

Or is there another pattern you would use?

Best Answer

Normalize always unless you have a very good reason not to. It should not be flipping a coin.

Neither of your proposed designs seems optimal to me for a variable / unknown number of events, since when "that number may increase or decrease," in both cases, you have to change the schema and the code.

I also don't believe that you have to base your core table design around the format of the input files. This is why we have staging tables, ETL processes, SSIS, C#, etc. There are many ways to transform data from flat files into a format that is more appropriate for a relational database. Don't design your database because of what your flat files look like.

So you could have staging tables that look like the input files, but then you would load the data into real tables that look perhaps like this:

CREATE TABLE dbo.Events -- lookup table describing events
(
  EventID INT PRIMARY KEY,
  Description VARCHAR(32) NOT NULL UNIQUE
);

CREATE TABLE dbo.OrderEventLog -- actual event data
(
  EventID INT FOREIGN KEY REFERENCES dbo.Events(EventID),
  OrderID INT FOREIGN KEY REFERENCES dbo.Orders(OrderID), -- guess
  ExpectedDate DATETIME2(7),
  ActualDate DATETIME2(7)
);

Now when you have a new type of event, you just insert it into the dbo.Events table, and then you can insert related events into dbo.OrderEventLog without having to change the schema of either table, and without having to change the interface to your stored procedures or bulk inserts into the staging tables either. This also keeps the history table (and all of its indexes) nice and narrow, and allows you to build indexes that favor searches for specific types of events.

You can always flatten the data later - which is typically what you do when you get to a point where you are building a data warehouse around the data.