Sql-server – SSIS – SSAS: Missing data handling

nullsql serverssasssis

I have a solution in SSDT with a SSIS and a SSAS project, with missing values in both numeric and text columns.

Is it a good practice converting all the missing values of these columns to NULL during SSIS and then handle in SSAS?

Is there a way to ignore all NULL entries during every query in a SSAS OLAP Cube, besides defining calculated measures etc. for each possible query?

Best Answer

The way you handle null or missing data depends on where you have your missing data.

If you have missing data in fact tables in your measure columns you should probably leave them as null so they don't get considered in the NONEMPTYCROSSJOIN queries sent by most reporting applications.

If you have missing data in your foreign keys in your fact table (i.e. when your ETL process looks up the surrogate keys in your dimension table but the related row doesn't exist in your dimension) you need to decide based on your user requirements, but the general approach is to convert them to blank, 'unknown, invalid or whatever your requirements are.

If you have blank attribute values you should probably leave it as blank as to not confuse the users.

One thing to keep in mind specific to SSAS is that using error configurations in SSAS to let SSAS handle missing dimension keys has a huge impact on processing performance so you should probably solve that in your ETL process. The (unknown) dimension member is also very confusing for your end users as they don't know what it means when seen on a report and often forget to include it in filters (they have to select the unknown member on every dimension or they will be missing data).

As usual Kimball has a best practice on handling missing data in your dimensional model: Design Tip #128 Selecting Default Values for Nulls

So in your case I wouldn't convert them to NULL but to something meaningful when it comes to dimension keys in your fact table (just insert a row in your dimension with a surrogate key and a meaningful value such as -1, 'missing') and leave them as null or convert them to null in fact table measures so they play well with aggregates and NONEMPTY queries.