seeing as no one has helped me with this i'll answer myself to help anyone who is looking at the same thing. It looks as if to enable CDC on an existing warehouse you need to do this on a table by table basis, recommending capturing all the ETL in package per table. Also, it is recommended to enable CDC state per table to avoid conflicts. Recommended approach would be --> enable CDC on source DB. Transfer incremental loads to staging DB or schema inside DWH. Incrementally load records to Dimensions and Facts. Every example i can find on net is for one table, which is fairly useless... be much appreciated if someone can find a CDC example for multiple tables...
When working with addresses, one thing you may consider -- at least give it a serious look -- is using the USPS to convert the user-supplied address into the USPS standard address. So the same address, just with different abbreviations or punctuations, will converge to the same address. There are several remote apis the Post Office makes available to your app.
As for the database, yes, the literature shows that the place to put a 1-to-many FK is in the "many" tuple. You have also stated you want to maintain a history of address changes so this format makes even more sense.
However, there are many advantages to having a 1-to-many intersection table. This is similar to a many-to-many intersection table except that one side is made part of the PK to enforce the "1" side of the 1-to-many cardinality.
create table OneToMany(
OneID int not null
references UserTable( ID ),
ManyID int not null
references AddressTable( ID ),
TypeID char( 1 ), -- 'H' for Home, 'W' for work, etc.
constraint PK_OneToMany primary key( OneID, TypeID )
);
Each user has only one entry for each kind of address and even if multiple users share the same address, if one user moves to a different address, the other users are not affected.
There is a small alteration needed to maintain the history of changes. All that is needed is the addition of a datetime field.
create table OneToMany(
OneID int not null
references UserTable( ID ),
ManyID int not null
references AddressTable( ID ),
TypeID char( 1 ), -- 'H' for Home, 'W' for work, etc.
Effective datetime not null default NOW,
constraint PK_OneToMany primary key( OneID, TypeID, Effective )
);
Now all changes are maintained and it is very easy to look back at any time in the past to see what the address was at that time. Here is an answer I posted that will refer you to more information. Mine is not the accepted answer, and I would highly recommend you take the time to become familiar with both mine and the accepted answer. Choose the one (or design your own with what you learn) that is best for your situation.
Best Answer
The right schema depends on how you intend to use this data. Let's talk about your options:
a) Bridge table.
I would not use this approach, because bridge tables are used to express
N:N
relationships. If you go for it, that would not be possible to precisely say what is the right address for your transaction; so it would be useless.Following Kimball group -> "Bridge tables are used...where a many-to-many relationship can’t be resolved in the fact table itself..." https://www.kimballgroup.com/2008/09/design-tip-105-snowflakes-outriggers-and-bridges/
b) Create another dimension DimAddress (DimGeography?)
This looks a good approach, especially if this Address information could be related to other Fact tables. Maybe you could expand it into a DimGeography, or something like that. In this case, DimGeography and DimCustomer would not be related directly.
Other options
c) You could also change the granularity of your Customer dimension to the Address level; so you would have one row per address and the "master customer data" would be the same for the same customer in this dimension. -> Do that if most of your data is related to the Address level.
d) Another approach would be snowflaking your schema. Creating then a DimCustomerAddress that can be used on this fact; where your fact table points to DimCustomerAddress and DimCustomerAddress points to DimCustomer. In this way, you don't change your Customer table neither the other fact tables and can create the Customer -> Address hierarchy.
e) If you have to track historical address changes (based on comments)
This solution is similar to d), but the DimCustomerAddress would be an SCD type 2. This will make the granularity of your new DimCustomerAddress be the "address history". I.E: I would assume that your business key for this dimension is the Customer ID, AddressID and Address type, so you should add a new row everytime you get an address change; keeping tracking of the start and end dates that the address is valid. Your fact table should be related with DimCustomerAddress by the surrogate key "CustomerAddressKey", and DimCustomerAddress should relate with DimCustomer using "CustomerKey". The Start and End dates can be used during the ETL time to populate the Fact table to determinate the right "CustomerAddressKey".