Sql-server – Dimension Help – Deciding Fact or Dimension

database-designslowly-changing-dimensionsql serverssas

We have the following Dim and Facts:

Customer Dim: SCD Type 2, Info about the customer ie first purchase date, name, address, etc
Product Dim: SCD Type 2, about our products

Customer Snapshot Fact: Monthly Fin facts about the Customer
Product Sales Fact: Sales by Customer

and will have many more facts that involve the customer Dimension.

We have a legacy DB that was collecting 100's of data fields about a customer and have been asked to DW this data. There are 100 plus fields related to a customer and may of them are flags that indicate if the customer qualifies for something or not. The majority of the queries the users will want against any of our fact table may include the filtering and or grouping of these indicators.

The question is should we add 100 plus indicators to our Customer Dim and if not how should the data be structured so this information can be joined with all of our other facts.

Thanks for your help.

Best Answer

Put the indicators on the customer dimension.

This means that any fact table that joins against the customer dimension has access to all of the indicators. If they are on the dimension then you can trivially make them available to any fact table that links to the customer dimension.

If you just need a fact table with counts of customers that roll up by the dimension attributes, then you can create a 'factless fact table' that just has a single fact - a 'QTY' column with a value of 1. This allows counts of customers to be grouped by any of the attributes.

If Customer is a slowly changing dimension, consider putting an additional row with -1 in the QTY column into the fact table every time a type 2 change is made. This should link to the previous version of the dimension, with the current version having an additional row with a 'QTY' of 1. This allows you to track statistics on changes in customer attributes over time.

A cube can consume this change over time for the counts by implementing a calculated measure that does a running sum in 'QTY' from beginning up to the selected date. If necessary you could also build a snapshot table.

If the number of attributes and volume of changes becomes unwieldy then you have what Kimball calls a 'Rapidly Changing Monster Dimension' in his first book. In this case, consider pulling the attributes out into a separate junk dimension that has a row for each distinct combination of values. You will still want some scaffolding to link this to the actual dimension rows so you can copy the junk dimension key onto any fact tables you have the customer dimension key on.

A cube can consume either structure, so it is less likely to be an issue for the cube so much as an issue for the ETL processing.