I'm trying to design a star-schema fact table along with some dimension tables that will surround it. If I reuse the natural key called customer_key
in both the fact_table
and dim_customer
then I don't see a difference between calling something dim and a lookup table. Additionally if I every need to update the customer_name
I will loose historical data that represented this fact at the time of the recording. What am I missing when trying to model the dim and fact tables?
I would like to understand difference between relational "lookup" table technique and data warehouse "dimension" table?
fact_table dim_customer dim_product
------------- ------------ -----------
customer_key customer_key product_key
product_key name name
units_sold email description
unit_price
Please forgive any ignorance I might show in this question. I am a data warehouse newbie.
Best Answer
If you only have those tables there will be little difference between your datawarehouse star schema and your actual schema.
You probably have a much more complex relational schema though, where you also have customer groups or item types and your schema looks more like
fact table -> customer table -> customer group table
fact table -> item table -> item type table
When you are working towards a star schema you would then denormalize your schema so you include the customer group description in your customer dimension table and your item type description in your item dimension table.
The basis of a star schema is that you have a fact table, and every dimension is a single table just a single step away from the fact table.
The way kimball describes it:
If you don't denormalize your dimensions into single dimension tables you end up with a snowflake schema. Snowflakes may seem like a good idea, but they often result in poorer query performance and can get you in all kinds of trouble later on, especially if you plan to build a cube on top of your data warehouse.
Again Kimball has a post on it:
With regards to your question about changing the customer data and losing history, in a datawarehouse that's solved by using "slowly changing dimensions".
You basically have several types of slowly changing dimensions, but you are probably looking for "type 2" which largely comes down to storing a valid_from and valid_to date on your dimension row so you can select the row that was valid at the time of the transaction.
Type 2 SCD's
It's a broad subject to cover but I hope this sets you on the right track for the basics and you can use the correct terminology in the future.
The kimball site is a great resource (and many call it the reference) for dimensional modeling so I suggest you browse the glossary and the design tips section