Postgresql – Time dimension or timestamp in fact table

data-warehousedimensionpostgresqltime

Which would you use, and why? A separate time dimension or putting a timestamp in a fact table? Or perhaps both?

I am building a data warehouse, and need to represent the time of day that events occur at, down to the one second granularity. I want to roll data up; plotting a graph of number of events per hour of the day, for example.

Kimball's "The Data Warehouse Toolkit", has a design for a time of day dimension. A more recent blog post recommends not doing that, and using a timestamp in the fact table instead:

http://www.kimballgroup.com/2004/02/design-tip-51-latest-thinking-on-time-dimension-tables/

If I go with timestamps in the fact table, will it still be easy/fast to roll up by the hour?

Any other trade-offs to consider in making this choice?

Best Answer

I would recommend including both on the fact table. The dimension should be used for filtering and grouping, while the time stamp value can be used in detailed reports/queries.

Unless you care about whether an event occurred at 8 seconds or 42 seconds past the minute, create your time dimension at the grain of 1 minute.

As you did not tag your RDBMS, I thought it prudent to mention that later versions of SQL Server do not allow date + time data type operations; i.e. recreating the time stamp from the date & time dimension business keys. A work around is convert(datetime, [date column]) + convert(datetime, [time column]), or similar.

Best Answer

Related Solutions

Sql-server – Datawarehouse Design: Combined Date Time dimension vs. Separate Day and Time dimensions and timezones

Data warehouse: using day number in Epoch in a date dimension

Related Question