In star schemas, what should I do when a dimension has a date attribute

data-warehousedimension

I'm trying to learn about dimensional models and star schemas. Say I have a Sales fact table recording the total sales at retail stores, with, say, four dimensions, Date, Customers, Stores, and Promotions (as in sales promotions like coupons). (Pseudo-schema below).

I really like the idea of prepopulating the Date dimension with attributes like day of the week, so that you can easily filter the facts by seemingly complex conditions like "saturdays in the second quarter for the past 10 years".

Now, say I want the Promotion dimension to have Start_Date and End_Date attributes which denote the start and end of the promotion. How do I do this? I thought about it and can come up with three less than ideal solutions:

1) Make Start_Date and End_Date regular, atomic attributes (either ISO8601 strings or the DB's datetime object) and only allow limitted filtering.

2) Make Start_Date and End_Date foreign keys to the Date table, breaking the star schema, but allowing the same filtering capabilities as the full Date dimension.

3) Include attributes like Start_Date_Day_of_Week, Start_Date_Quarter, maintaining the star schema but increasing the dimension size and essentially duplicating the structure of the Date dimension in the Promotions dimension.

Sales Table
-----------
Date key      (FK to Dates table)
Promotion key (FK to Promotions)
Customer key  (FK to Customers table)
Store key     (FK to Stores)
Sales Amount  

Date Table
----------
Date Key (PK)
Month
Day of week
Day of month
Day of Quarter
Day of year
Holiday?
Quarter
Day since Epoch
Month since Epoch
etc...

Promotion Table
---------------
Promotion Key (PK)
Type (Coupon code, 2-for-1 sale, 50% off, etc)
Start_Date
End_Date

Customer Table
--------------
Customer Key (PK)
Name 
Age
Sex
etc...

Stores Table
------------
Store Key (PK)
Address
City State
Zip
etc...

Best Answer

Old question & I know the original posters are inactive, but since it came up in a Google search, I thought I'd add my thoughts.

1) Make Start_Date and End_Date regular, atomic attributes (either ISO8601 strings or the DB's datetime object) and only allow limitted filtering.

Putting a date attribute into an existing dimension makes sense if the main goal is informational (i.e. people want to include the start/end date of the promotion in reports). However, if the intention is to filter, then filtering on an atomic date attribute will be relatively painful.

2) Make Start_Date and End_Date foreign keys to the Date table, breaking the star schema, but allowing the same filtering capabilities as the full Date dimension.

This type of snowflaking is called an Outrigger Dimension and can be acceptable in a star schema. (http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/outrigger-dimension/). I imagine this is what you did.

3) Include attributes like Start_Date_Day_of_Week, Start_Date_Quarter, maintaining the star schema but increasing the dimension size and essentially duplicating the structure of the Date dimension in the Promotions dimension.

The duplication would get messy (maintaining multiple date fields, especially since you have multiple dates in the same dimension). It also makes future maintenance more complex: if you want to alter your date dimension, you have to repeat that alteration in dozens of places (or have an inconsistent user experience).

On the flip side, this might be a valid option if, for example, there are just one or two ways that you know people want to filter promotions on. For example, if you know end-users want to quickly filter to active promotions or "promotions closed in the last 30 days", you could calculate and add that as a specific attribute in the promotion dimension, rather than linking to a full date dimension for promotion start & end date. (Though what starts out as one or two attributes can expand quickly, as end-users find more and more time periods they want to filter to.)

There is a 4th and 5th option, depending on the specifics of what you intend to filter by promotion start/end dates:

4) Add the promotion start/end date fields as keys in the Sales table.

You'll be storing the same piece of information each time a promotion is used, so at first glance, it feels like a lot of duplication. But denormalization is the name of the game in a star schema. If your goal is to filter to sales based on the promotion start or end date (i.e. sales where the promotion start date was last month), I would recommend this over an outrigger dimension.

5) Create a new, factless fact table, for promotions

If your goal isn't to filter sales at all, then a factless fact table might be a better solution. This could tell you how many promotions were in effect on a given date (regardless of whether there were associated sales), for example. More information: http://www.kimballgroup.com/2011/04/design-tip-133-factless-fact-tables-for-simplification/

My experience with dimensional modeling is that the best solution often depends on what the end-user is trying to accomplish. If you (or they) are not yet sure what they're trying to accomplish (i.e. you know promotions have start & end dates and that's it), I'd start with option #1. This puts the information in your model without too much work.

If it later transpires end-users are trying to filter Sales by start & end dates, that's when I'd consider #2 or #4. If it transpires end-users are trying to analyze promotions independently of sales, I'd look at #5.

Best Answer

Related Solutions

Where should one put indexes in a time dimension table

Data warehouse: using day number in Epoch in a date dimension

Related Question