Star Schema – One dimension or split

business-intelligencedata-warehousestar-schema

We have a Data Warehouse, which has two dimensions, Product and ProductLine. Product has just one entry and there is just one ProductLine with a description that changed from transport to Transport. Should product and product line be combined to one product dimension when a Star Schema is created out of the source data?

+-------------+-------------+-------------+------------------------+------------+------------+
| Product     |             |             |                        |            |            |
+-------------+-------------+-------------+------------------------+------------+------------+
| SurrogateID | BusinessKey | Description | ProductLineBusinessKey | ValidFrom  | ValidTo    |
+-------------+-------------+-------------+------------------------+------------+------------+
| 1           | 100         | Car         | 10                     | 01.01.1900 | 31.12.9999 |
+-------------+-------------+-------------+------------------------+------------+------------+

+-------------+-------------+-------------+------------+------------+
| ProductLine |             |             |            |            |
+-------------+-------------+-------------+------------+------------+
| SurrogateID | BusinessKey | Description | ValidFrom  | ValidTo    |
+-------------+-------------+-------------+------------+------------+
| 1           | 10          | transport   | 01.01.1900 | 15.10.2016 |
+-------------+-------------+-------------+------------+------------+
| 2           | 10          | Transport   | 16.10.2016 | 31.12.9999 |
+-------------+-------------+-------------+------------+------------+

A) One dimension table for product which contains product and product line?

+-------------+--------------------+------------------------+-------------+-----------------+------------+------------+
| DimProduct  |                    |                        |             |                 |            |            |
+-------------+--------------------+------------------------+-------------+-----------------+------------+------------+
| SurrogateId | BusinessKeyProduct | BusinessKeyProducsLine | DescProduct | DescProductLine | ValidFrom  | ValidTo    |
+-------------+--------------------+------------------------+-------------+-----------------+------------+------------+
| 1           | 100                | 10                     | Car         | transport       | 01.01.1900 | 15.10.2016 |
+-------------+--------------------+------------------------+-------------+-----------------+------------+------------+
| 2           | 100                | 10                     | Car         | Transport       | 16.10.2016 | 31.12.9999 |
+-------------+--------------------+------------------------+-------------+-----------------+------------+------------+

+-----------------------+--------------------+----------------+
| FactTable             |                    |                |
+-----------------------+--------------------+----------------+
| SurrogateIdDimProduct | SurrogateIdDimDate | MeasureSalesPc |
+-----------------------+--------------------+----------------+
| 1                     | 20161014           | 100            |
+-----------------------+--------------------+----------------+
| 2                     | 20161016           | 150            |
+-----------------------+--------------------+----------------+

B) Two dimension tables for product product and product line?

+-------------+--------------------+-------------+------------+------------+
| DimProduct  |                    |             |            |            |
+-------------+--------------------+-------------+------------+------------+
| SurrogateId | BusinessKeyProduct | DescProduct | ValidFrom  | ValidTo    |
+-------------+--------------------+-------------+------------+------------+
| 1           | 100                | Car         | 01.01.1900 | 31.12.9999 |
+-------------+--------------------+-------------+------------+------------+

+----------------+-------------+-------------+------------+------------+
| DimProductLine |             |             |            |            |
+----------------+-------------+-------------+------------+------------+
| SurrogateID    | BusinessKey | Description | ValidFrom  | ValidTo    |
+----------------+-------------+-------------+------------+------------+
| 1              | 10          | transport   | 01.01.1900 | 15.10.2016 |
+----------------+-------------+-------------+------------+------------+
| 2              | 10          | Transport   | 16.10.2016 | 31.12.9999 |
+----------------+-------------+-------------+------------+------------+

+-----------------------+----------------+--------------------+----------------+
| FactTable             |                |                    |                |
+-----------------------+----------------+--------------------+----------------+
| SurrogateIdDimProduct | DimProductLine | SurrogateIdDimDate | MeasureSalesPc |
+-----------------------+----------------+--------------------+----------------+
| 1                     | 1              | 20161014           | 100            |
+-----------------------+----------------+--------------------+----------------+
| 1                     | 2              | 20161016           | 150            |
+-----------------------+----------------+--------------------+----------------+

How should a Star Schema the Data Mart for this scenario be designed?
Obviously solution A has the advantage that just one join is needed while B leads to less entries in the product dimension. If there are many products the dimension could grow rapidly. Are there other considerations? What is best practice?

Sorry for the ugly "tables".

Best Answer

There is something to say for both approaches. I usually prefer to have this situation in a single dimension so you can define attribute relationships that allow the engine to calculate bitmap indexes.

If you do that, the query engine doesn't need to calculate the nonemptycrossjoin by actually calculating the combinations and see if they are empty or not but could use that index to see what products belong to a product line and just calculate the emptyness for those products.

If however you have multiple measure groups, and on some measure groups you can only link to the product dimension and on some other measure groups you can only link the productline dimension you could choose to have separate dimensions to avoid linking to a non-granularity attribute.