Dimensional Hierarchy – Best Way to Capture Historical Changes

change-data-capturedata-warehousedimensional-modelingetl

What is the cleanest way to capture changes between levels in a dimensional hierarchy?

I have the dimensional hierarchy [Area] > [Region] > [Location], where Area is the parent to Region and Region is the parent to Location. Location is lowest level of the hierarchy, and is associated with the fact table.

The business rules are: Locations are able to change the Region they are assigned to based on business need. Furthermore, Regions are able to change the Area they are assigned to.

We want to capture those changes so that when we do historical analysis, we can compare business metrics before and after a Location changed its Region, and a before and after a Region changed its Area.

What is the best way to do this?

The four ways that I have come up with all seem to have major drawbacks:

  1. Using a bridge table between hierarchy levels requires that you use time bounds (effective start/end dates) to correctly identify which parent a child belongs to as you walk up the hierarchy. That makes queries prone to error, especially with hierarchies with many levels.
  2. Versioning a child record whenever its relationship to its parent changes (e.g., versioning Region A when it moves from Area A to Area B) and then cascading that change all the way down the hierarchy (e.g., creating new versions of Locations tied to Region A so that they are tied to the new Region A).
  3. Updating the child record's reference to the parent record, making you use time bounds to reassemble the correct history (prone to error).
  4. Always use time bounds when querying up the hierarchy to make sure Location A is associated with the correct Region (likewise, Region A is associated with the correct Area) for the moment of the fact being analyzed.

None of these seems clean to me.

Is there a better way?

Thanks a lot

Best Answer

I highly recommend the 1st option of a bridge table. If you make sure the you capture the historical change on all three tables that would work fine, though I'd agree with you that the JOIN'ing is an issue. Just make sure that (in the JOIN clause) that the DateInserted and ValidTill match the ones that are in the fact table. More about bridge tables in here - Kimball university

ALTERNATIVELY: In the fact table keep references to all three dimension (Area, Region, Location). Since anyway you need to have a DATE on a fact table, you can map the changes easily, and since - as you said- Location could be A for one transaction but B for a different one, you'd be better off this way GROUPing BY and seeing trends and changes. The down side: it will make the fact table fatter.

If you are using SSAS (or intend to use it for data modelling) I recommend the very good read The Many to Many revolution which explains how to implement it to the end user.