Data Warehouse Design – Kimball Customer and Customer Address Dimensions

data-warehousedatabase-designsql serversql-server-2016

I have a Customer Table and Multiple Customer Addresses. A customer can have multiple addresses: home, work, vacation,etc.

In Kimball data warehouse, would I utilize a bridge table to go between Customer and Customer Address information?

(a)
So [DimCustomer] to [DimCustomerAddressBridge] to [DimAddress] Table?
Would this be the proper technique?

(b)
Or would I just have a Fact Table, like CustomerTransactionPayment Table, that connects the two Dimensions [DimCustomer] and [DimAddress]?

So one strategy is the connected Dims using bridge, and another is disconnected Dims connected by a Fact transaction table?

Best Answer

The right schema depends on how you intend to use this data. Let's talk about your options:

a) Bridge table.

I would not use this approach, because bridge tables are used to express N:N relationships. If you go for it, that would not be possible to precisely say what is the right address for your transaction; so it would be useless.

Following Kimball group -> "Bridge tables are used...where a many-to-many relationship can’t be resolved in the fact table itself..." https://www.kimballgroup.com/2008/09/design-tip-105-snowflakes-outriggers-and-bridges/

b) Create another dimension DimAddress (DimGeography?)

This looks a good approach, especially if this Address information could be related to other Fact tables. Maybe you could expand it into a DimGeography, or something like that. In this case, DimGeography and DimCustomer would not be related directly.

Other options

c) You could also change the granularity of your Customer dimension to the Address level; so you would have one row per address and the "master customer data" would be the same for the same customer in this dimension. -> Do that if most of your data is related to the Address level.

d) Another approach would be snowflaking your schema. Creating then a DimCustomerAddress that can be used on this fact; where your fact table points to DimCustomerAddress and DimCustomerAddress points to DimCustomer. In this way, you don't change your Customer table neither the other fact tables and can create the Customer -> Address hierarchy.

e) If you have to track historical address changes (based on comments)

This solution is similar to d), but the DimCustomerAddress would be an SCD type 2. This will make the granularity of your new DimCustomerAddress be the "address history". I.E: I would assume that your business key for this dimension is the Customer ID, AddressID and Address type, so you should add a new row everytime you get an address change; keeping tracking of the start and end dates that the address is valid. Your fact table should be related with DimCustomerAddress by the surrogate key "CustomerAddressKey", and DimCustomerAddress should relate with DimCustomer using "CustomerKey". The Start and End dates can be used during the ETL time to populate the Fact table to determinate the right "CustomerAddressKey". This example illustrates the case e)