Star Schema – Benefits of Having Time Dimension

data-warehousedatabase-designschemastar-schema

What would be the benefits of having a time dimension in a star schema over having the time attributes in the fact table itself?

For example:

I have a transaction data with user information for each transaction, country where the transaction took place and dates of when it occured.

Option 1
Correct me if I am wrong, but this is probably the widely used approached and most recommended by many:

  • A transaction fact table containing transaction_ID (PK), user_id (FK) and country_id (FK), and date_id (FK)

  • User dimension containing user_id (PK) and the other user attributes, let's say name & phone_number.

  • Dates dimension that consists of date_id (PK), date, day, month, year, quarter.

Option 2
Something that I just thought about instead of choosing Option 1, but unsure about:

  • A transaction fact table containing transaction_ID (PK), user_id
    (FK) and country_id (FK), date, day, month, year,
    quarter.

  • User dimension containing user_id (PK) and the other user attributes, let's say name & phone_number.

What would be the benefits of having Option 1 over Option 2? I am not aware of the reasons why joining with another Date dimension would be a better option even though it is most widely used approach. Thanks a lot!

Best Answer

Let me answer this question with a scenario starting with a simple Transaction table. When our business started, management wanted to know the 'name' of the month, so I've included that information in the table.

DECLARE @Transactions TABLE (
    TransactionId INT
    ,UserId VARCHAR(10)
    ,CountryId INT
    ,TransactionDate DATE
    ,[MonthName] VARCHAR(20)
    ,SalesAmount DECIMAL(18, 2)
    )

Business has been good and we already have 1 million rows in our Transactions table. In fact, business is so good that management is now asking more in depth questions about our sales. They wanted to know what 'quarter' the sale was made.

ALTER TABLE Transactions ADD [QuarterName] VARCHAR(10)
UPDATE Transactions SET QuarterName = ... 

We just updated 1 million rows.

As time goes by, management starts asking more and more questions about our sales.

  • What DayOfTheWeek was that sale made?
  • Was that a holiday?
  • Was the moon full on that day?

ALTER TABLE Transaction ADD ...

UPDATE TABLE SET ...

Hopefully you can see where this is going. Additionally, all of that redundant data on each and every Transaction row can contribute to reduced performance and increase resource utilization (memory, disk space, etc.). Our databases are bigger and take longer to back up. All of the redundant data takes up memory.

With a Date Dimension table, all of that information is stored in one place. A Date Dimension table with dates from 2000-01-01 to 2100-01-01 contains just 36525 rows. Anytime we want to track a new attribute of a date, we only have to alter that table by adding the additional attribute and update 36525 rows.

When we want specific information about the 'Date' attributes of a sale, we simply join up against the Date Dimension table

Additionally, the data in a Date Dimension is consistent. January is spelled correctly, Saturday is spelled correctly, etc. Storing this kind of data in the Transaction table can lead to all kinds of discrepancies with incorrect spellings, etc.

For more information on the creation of a Date Dimension table, check out Creating a date dimension or calendar table in SQL Server