We are starting to design the building blocks of a data mart/warehouse and we need to be able to support all time zones (our clients are from all over the world). From reading discussions online (and in books), a common solution seems to be to have a separate date and time dimension as well as a timestamp in the fact tables.
However, the question I am having hard time answering is what good does the date and time dimensions actually do for me considering my dynamic time zone requirements? A time dimension makes a little more sense but I'm having hard time with the date dimension. A general design approach for a date dimension usually includes properties such as day name, day of week, month name, etc. The problem I am having with all that is that 11:00 PM on Tuesday, December 31, 2013 in UTC is Wednesday, January 1st, 2014 in all time zones that are after UTC+2.
So if I will have to do all these time zone conversions on each and every query (and report) then what is the point of having and storing these properties that I will probably never use (seems like)? Some people suggest having fact rows for each time zone but that seems ridiculous to me. We need to be able to store millions of records each month.
Others suggest having a time zone bridge table which although makes some sense, it also seems like extra complexity and extra joins to accomplish something that my client apps and reports should easily be able to figure out from a date (reporting will be primarily web-based where there are a myriad of libraries to aide in converting, displaying and formatting dates).
The only thing I can think of is the ease and possibly performance of grouping by date and hour but how bad of a practice is to group by datepart (we're using MS SQL but we will be querying millions of rows) or should we consider just extremely simple date and time dimensions with not much more than hour, day, month and year numbers for the most part as most literals such as Monday wouldn't mean much when time zones come into play?
Best Answer
Firstly...
Separating
Datime/Time
into aDate
dimension and aTime
dimension is definitely the way to go.To manage multiple time zones you need to duplicate the
DateKey
and theTimeKey
so that you have the following:LocalDateKey
LocalTimeKey
UtcDateKey
UtcTimeKey
You say...
By having the 4 columns I've listed above you, will be able to join the fact table to the Date and/or Time dimension Using Table Aliases (in Kimball terminology these aliased dimension tables are known as "Role Playing Dimensions"), so you would have something like the following:
In closing...
As you're building a data mart, and not an OLTP database, the generation of the Local and Utc times should be performed in your ETL, NOT in any client side applications for the following reasons (apart from localisation of the UTC time to the report reader's perspective):
StandardisedDateKey
, orCorporateHQDateKey
, where instead of a UTC date table you standardise based on some other business agreed standard