SCD Type 2 Dimension -> Is this the correct layout for a type 2 scd with this type of data present

slowly-changing-dimensionssas

Here is an example picture of my layout.

enter image description here

As you can see I have my SCD types present (status/startdate/enddate/businesskey).

My key is a surrogate key that identifies each record.

My problem is that my hierarchy seems to error out and it may be due to an incorrect layout. Here is my current key column structure.

Hierarchy

LineOfBusinessId (key column: LineOfBusinessId) (name column: LineOfBusinessName)

WorkerDivisionId (key column: LineOfBusinessId, WorkerDivisionId) (name column: WorkerDivisionName)

WorkerId (key column: LineOfBusinessId, WorkerDivisionId, WorkerId) (name column: WorkerName)

This dimension errors out because a workername changes and when the select distinct occurs during full processing it finds duplicates. I am just wondering what the best way of setting up the key columns is for this specific situation. Do the names need to be part of the key columns or is my hierarchy completely incorrect. I was sure I had this set up correctly but I am obviously questioning myself now.

I have received some advice that the best thing to do in this situation is to use the surrogate key as part of my key columns, but then found out that this will cause issues as I only want one record to show up when querying my SCD. Later, I received some more advice saying that I shouldn't use the surrogate in the id key columns but in the name key columns, specifically at the Worker level.

I am rather skiddish of updating all of my cubes with this setup without hearing from someone who has made a hierarchy of this type. When I look at this setup, I have to think that others have made hierarchies of this type as it is rather simple. Any advice/direction on this is greatly appreciated.

Thanks.

Best Answer

Making SSAS hierarchies on slowly changing dimensions is a bit of a fiddle. You need to make surrogate keys for each historical version at each level of the hierarchy. Then the key has the actual business facing name, which the user selects or reports by.

As an example, imagine worker BloggsJ in Division1, which is in LineOfBusiness1. Now Division1 gets moved to LineOfBusiness2. Logically you have the Division entity with two rows now:

DivisionKey     Division     LineOfBusinessKey
          1     Division1                   11 
          2     Division2                   12
          3     Division1                   12

and

LineOfBusinessKey    LineOfBusiness
               11    LineOfBusiness1
               12    LineOfBusiness2

Now, we have worker BloggsJ, who is assigned to Division 1, which is subsequently moved

WorkerKey  WorkerName   DivisionKey   Division   LineOfBusinessKey   LineOfBusiness
      101     BloggsJ             1   Division1                 11  LineOfBusiness1
      102     BloggsJ             3   Division1                 12  LineOfBusiness2
      103     SmithF              2   Division2                 12  LineOfBusiness2

In this case the keys remain in a strictly hierachical order:

LineOfBusiness2 (Key=12) has two children: Division2 (Key=2) and Division1 (Key=3). Division1 (Key=3) has one child: BloggsJ (Key=102) and Division2 (Key=2) has one child: SmithF (Key=103).
LineOfBusiness1 (Key=11) has one child: Division1 (Key=1), which has one Child: BloggsJ (Key=101)

Displaying the name in the cube allows you to build a hierarchy that can support drill-down operations. You will also probably want to hide the base attributes for this hierarchy and display another set with just the names, so there is something in the dimension that will produce a clean, unique list of members at each level without opaque, confusing repeated names.

*****Update 05012012 3:22 PM CST

Here is an image of my data example.

enter image description here

Related Solutions

SSAS Dimensions Hierarchy causes deployment errors

I've often found errors reported from SSAS indicating a duplicate key value in the dimension. When I've reviewed the source data, the duplicates aren't found. While I can't shed light on why this error is popping up in the dimension, I can offer a solution that bypasses the error and allows the cube to finish deploying and processing.

If you open the dimension object and look at the properties, you'll see a section for ErrorConfiguration. If you change this to "custom", you can tell SSAS what to do when it encounters errors in processing the object. If you want to log the error and allow the processing to continue, use "ReportAndContinue" as the value for properties like KeyDuplicate, KeyNotFound, NullKeyConvertedToUnknown, NullKeyNotAllowed - whatever is appropriate for your design. I set the KeyErrorLimit to 1 and set the KeyErrorLimitAction to StopLogging so your log files don't get overloaded with the same or similar errors.

As I said, this doesn't SOLVE the issue, but it does allow the cube to continue processing and sometimes, I've found, that's enough.

Sql-server – How to make a nullable “parent” hierarchy level

You actually have 2 questions in one question. If you create a new question for the attributes it would be neater and I'll cut/paste half of this as an answer there :)

Nullable Parent Level

You probably don't want NULLs in your OLAP dimensions, and Kimball seems to agree.

Nulls should also be avoided when we can’t provide a value for a dimension attribute in a valid dimension row. There are a several reasons why the value of a dimension attribute may not be available:

Missing Value – The attribute was missing from the source data.
Not Happened Yet – The attribute is not yet available due to source system timing issues.
Domain Violation – Either we have a data quality issue, or we don’t understand all the business rules surrounding the attribute. The data provided by the source system is invalid for the column type or outside the list of valid domain values.
Not Applicable – The attribute is not valid for the dimension row in question.

It kind of depends if you have an ETL process and a Data Warehouse or not how you should be handling them, but there are different types of 'not found'.

Think about the difference in a foreign key, one has an empty field, another has a field that's filled but the related record can't (or no longer) be found. I like to diffentiate between BLANK and DATA ERROR in my dimension.

In your example you could differentiate between 'no price code' and 'a price code I can no longer find'

If you have an ETL process with a Data Warehouse you can handle that easily in your ETL process, if you don't you would need some case statements in your DSV queries.

This question seems to reveal issues with the underlying Data Warehouse. There are arguments for and against both star and snowflake schema's but personally I tend towards a star schema, with some snowflake mixed in when necessary.
In any case the data cleansing and missing links needs to be solved in your Data Warehouse long before you reach the dsv.

Slowly Changing Dimension attributes

With regards to your Slowly Changing Dimension I don't see how the data type of hierarchies or keys in your dimension would change because the dimension somehow is SCD, that doesn't matter at all. You just need a validity rule somewhere in your ETL that gets picked up by your SSAS dimension definition (See here). But for any dimension key you create I suggest you use a surrogate key mostly because your surrogate key can be an int or bigint instead of a varchar and that could massively improve performance even for attribute keys.

Using a numeric key column instead of a string key column or a composite key will improve the performance of attributes that contain many members. This best practice is based on the same concept as using surrogate keys in relational tables for more efficient indexing. You can specify the numeric surrogate column as the key column and still use a string column as the name column so that the attribute members appear the same to end-users. As a guideline, if the attribute has more than one million members, you should consider using a numeric key.

Off course that numeric key would be a representation of 'the attribute' and not necessarily include the validity fields. The validity of the record is specified at the record in your dimension table, but as you state isn't necessary for your attribute keys.

For example this could be your dimension data

+---------------+-------+-----------+----------+
| DIMENSION_KEY |  NAME |  NAME_KEY |  CURRENT |
+---------------+-------+-----------+----------+
| 1             | tom   | 1         | y        |
| 2             | mat   | 2         | n        |
| 3             | mat   | 2         | y        |
+---------------+-------+-----------+----------+

Where you can pick dimension_key for the key of your key attribute and you could pick either name or name_key as the key of your name attribute.

Determining if it's worth the hassle for name depends on how many members your attribute will have (and your key attribute typically has most members).

In the end there isn't really any relation between the fact that you have a SCD and your decision what key is a good pick for your attribute. End user requirements make that decision for you. In the example dimension you would want all sales by mat reported under mat, and not have 2 mat's in your members when users report on that.

Best Answer

Related Solutions

SSAS Dimensions Hierarchy causes deployment errors

Sql-server – How to make a nullable “parent” hierarchy level

Related Question