Sql-server – Datawarehouse Design: Combined Date Time dimension vs. Separate Day and Time dimensions and timezones

data-warehousedimensional-modelingsql serverssasssis

We are just starting design for a new data warehouse and we're trying to design how our date and time dimensions will work. We need to be able to support multiple timezones (probably at least GMT, IST, PST and EST). We were initially thinking that we would have one wide combined date time dimension down to maybe 15 minute granularity, that way we have one key in our fact tables and all the different date time data for all supported timezones are in one dimension table.
(i.e. Date Key, GMT Date, GMT Time, IST Date, IST Time, etc…)

Kimball suggests to have a separate day dimension from the time of day dimension to prevent the table from growing too large (The data warehouse toolkit p. 240) which sounds fine however that would mean we have two keys in our fact tables for each time zone we need to support (one for the date and one for the time of day).

As I'm very inexperienced in this area I'm hoping someone out there knows the tradeoffs between the two approaches, i.e. performance vs. the management of all the different time zone keys. Maybe there are other approaches too, I've seen some people talking about having a separate row in the fact table per timezone, but that seems like a problem if you fact tables are millions of rows then you need to quadruple it to add time zones.

If we do the 15 minute grain, we'll have 131,400 (24 * 15 * 365) rows per year in our date time dimension table which doesn't sound too horrid for performance but we won't know for sure till we test some prototype queries. The other concern with having separate time zone keys in the fact table is that the query has to join the dimension table to a different column based on the desired timezone, perhaps this is something that SSAS takes care of for you, I'm not sure.

thanks for any thoughts,
-Matt

Best Answer

Having the date and time separate will allow you to do aggregates by time much easily. for eg: if you want to run a query to find what time period of the day is most busy. This is much easily performed using a separate time dimension.

Also, you should just have one timekey. Decide on either GMT/ EST time - then use this in the fact table. If you need to run reports based off the other timezone, just convert it in your application or query.