Which would you use, and why? A separate time dimension or putting a timestamp in a fact table? Or perhaps both?
I am building a data warehouse, and need to represent the time of day that events occur at, down to the one second granularity. I want to roll data up; plotting a graph of number of events per hour of the day, for example.
Kimball's "The Data Warehouse Toolkit", has a design for a time of day dimension. A more recent blog post recommends not doing that, and using a timestamp in the fact table instead:
http://www.kimballgroup.com/2004/02/design-tip-51-latest-thinking-on-time-dimension-tables/
If I go with timestamps in the fact table, will it still be easy/fast to roll up by the hour?
Any other trade-offs to consider in making this choice?
Best Answer
I would recommend including both on the fact table. The dimension should be used for filtering and grouping, while the time stamp value can be used in detailed reports/queries.
Unless you care about whether an event occurred at 8 seconds or 42 seconds past the minute, create your time dimension at the grain of 1 minute.
As you did not tag your RDBMS, I thought it prudent to mention that later versions of SQL Server do not allow
date + time
data type operations; i.e. recreating the time stamp from the date & time dimension business keys. A work around isconvert(datetime, [date column]) + convert(datetime, [time column])
, or similar.