Boolean as measure in fact table or dimension/attribute value

data-warehousedimensional-modeling

My fact table consists of complaints with typical dimensions of who, what and when. We have a target that the complaint should be responded to within a certain time period.

I'm not sure on how I best to model whether the complaint had a response within that time period.

I can store both the target value and the result in the fact table as integers with the result as a measure. Or I could use a dimension which has values of yes/no instead of representing this as a measure, I could then use a calculated measure in my cube. Or I could use a combination of both.

Are there any pros, cons and gotchas modelling the facts in the ways described above?

I anticipate that this fact table will be used to get the total number of complaints, the number of complaints responded to in time, the percentage responded in time and also identify individual complaints which were not responded to in time.

Best Answer

Here is a flexible way to store it that should support changes in requirements in the future.

  • Store the target in a dimension like dim_complaint_type. That way if in the future you have different types of complaints – e.g. Billing, Technical, ... – each of them can be categorized and have different targets. By applying slowly changing dimensions type 2 on the target attribute, you can also change your target from one year to another without changing the result for past complaints. You can also do "what-if" analysis by changing the target in one dimensions and see how much more/less are now on-time.
  • Store the response time in the fact table, as the elapsed_time (integer) or as the registration_date and closing_date as suggested in the comments. My preference goes for registration_date and closing_date with a calculated column elapsed_time which is registration_date - closing_date.

For the result, I wouldn't store it as a flag (yes/no) but as a number. If you call the column on_time and put a 1 for each complaint that meets the target, you can easily SUM on that column to have the total of on_time complaints over a period.

I see three ways of getting the result :

  • Not storing it, but compute it on the fly in your reporting by having the same formula everywhere to check if fact_complaint.elapsed_time is lower than or equal to dim_complaint_type.target or not. It might be dangerous if different persons use different formulas. If you have a tool with a metadata layer (e.g. OBIEE, SAP BO, ...), you can define that result measure once for all.
  • Create a view that joins fact_complaint and dim_complaint_type and add that result measure in the select list. It would be defined once for all. But you need to be careful with that approach as some reporting tool might not act in the same way with a table or a view. The database optimizer might also not be able to do all its magic the same way.
  • Actually store it in your fact table (i.e. compute it at ETL time). This is probably the best approach in term of performance. But it requires to update that column if you change the target for some past complaints.