Date dimension OLAP cube design

olappentahoschemastar-schema

I have a sales reporting system with incoming data like this

date,editionID,outletId,Measure1,Measure2
2013-01-01,2,532,11,24

etc…

I am new to designing data cubes (for the pentaho BI suite) and when I was building the schema, everything was working fine until I wanted to define date as a date dimension. The pentaho schema designer needs this to be a foreign key reference to a date dimension table in order to use it as a dimension.

My question is, why do I need a separate table? How would I structure it? how would I reference it in the fact table?

Thanks!

Best Answer

Date dimensions are pretty standard in a data warehouse, and are highly recommended by Kimball as most facts tie to a date. Typically, the key is an integer. It can be a meaningless surrogate key, or it can be a "smart" key where the integer is in the form yyyymmdd; e.g., the key for August 2, 2014 would be 20140802.

Date dimensions provide a set of contiguous dates in multiple formats and allow you to do date calculations once rather than in each query. They make it very easy to do time period comparisons. You can add other fields that could be analytically relevant such as holidays, indicators of work days, fiscal calendars (where different from standard calendar years).

There are lots of scripts available online to create and populate a date dimension. Many tools will create the date dimension for you. I'm not sure what you are using for your underlying data source, so here are a couple of examples of date dimensions. Hopefully you can convert these to the appropriate format for your needs.