SSIS: Slowly changing dimension with lists (one-to-many relationship)

database-designmany-to-manyslowly-changing-dimensionssis

Slowly changing dimension (specifically type 2) is a brilliant concept for being able to keep historical periodic (e.g. daily) data without the storage overhead of taking an entirely new snapshot of the data every time it is updated. However, all examples I have seen so far assume my data for every element I am modelling is contained in one single row. For example, a person, and their name, age, and address. Now what happens when my source data has a multi-valued dimension (or in other words, a list)? Let's say I want to keep track of the number of cars someone owns over time. Expressed as tables, my source data would look like:

                         Car
Person                   --- 
------                   CarID
PersonID <-------------> PersonID
First Name               Year
Last Name                Make
Age                      Model
Street Address           Color
City
State

Yesterday, a person had 3 cars. Today, they have four. Maybe in the future, they have 2, but 2 that are completely different from the original 4 cars. How can I use SCD2 (Historical) to track this one-to-many relationship?

I can't find a lot online about how to do this, which probably means I am not using the correct search terms because I am positive others have solved this problem. Thinking of my own solution, I could imagine my Target SCD data is modeled as such:

PersonScd
---------               CarScd
PersonBK                ------
CarListSK <-----------> CarListSK
First Name              CarID
Last Name               Year
Age                     Make
Street Address          Model
City                    Color
State
EffectiveDate
ExpirationDate

This is a hybrid approach where if any item in a list changes from one time to the next, the list is completely rewritten with all the new details, and each list has its own ListSK. Therefore, in this example design:

  • Yesterday a person had 3 cars. This person has one row in the PersonScd table with an ExpirationDate of 9999-12-31. The CarScd table has three rows, each with the same CarListSK. The other details of each row in the CarScd table match those of the source Car table for this person. The person's CarListSK column holds the value of CarListSK that all three of the cars have in the CarScd table.
  • Today, this person now owns a fourth car. This means yesterday's row for this person expires (the ExpirationDate is now set at yesterday's date), and a new row is written to the PersonSK table for that person – the EffectiveDate is set to today and the ExpirationDate is 9999-12-31. Now, 4 new entries are written to the CarScd table, and all of these entries share a new CarListSK. Aside from the effective and expiration dates, the only difference between this person's row yesterday and today is the CarListSK column.
  • Now, as long as I join the PersonScd table to the CarScd table using the CarListSK column, I can know what cars a person owned on a certain date.
  • If any other attribute about this person changes in the future, such as his/her address, the new row written will retain the same CarListSK with 4 corresponding entries since his/her cars did not change.
  • If in the future the person now owns only 2 cars (whether or not they were the same as the previous 4), 2 totally new rows will be written in the CarScd table with a new common CarListSK, which the new PersonScd row will have for that person.

Assuming this is a feasible design, how do I build this behavior in SSIS? SCD in SSIS compares one row at a time, not lists at a time. I imagine I'd have to just use a script component.

If this design is flawed, what is the correct way to use Slowly Changing Dimension Type 2 with one-to-many lists?

Best Answer

I think that in order to keep the logical aspect of a dimension you should use a third table to relate both dimensions:

PersonScd                     GroupOfCars               CarScd
---------                     -------------              ------
PersonBK      <--------->                       <----->  PK_Car 
Unique_Person_code                                       Unique_Car_code
First Name                     Unique_Car_code            Year
Last Name                      Unique_Person_code         Make
Age                                                       Model
Street Address                                            Color
City                                                      EffectiveDate
State                                                     ExpirationDate
EffectiveDate                                             
ExpirationDate

* Unique_Car_code and Unique_Person_code are not foreign keys but unique_numbers used to identify a unique car or a unique person.