Business Logic in Data Warehouse

business-intelligencedata-warehousedimensional-modeling

my question is:

Should business logic be stored in the data warehouse data store? Or should it only exist in the reporting applications?

To give a more concrete example:

In the CRM operational system there are abstract concepts of leads and opportunities, and lead statuses and opportunity stages. These concepts are very common, well understood, and used across different CRM tools.

And business users can legitimately ask questions like, what is the conversion rate for all leads from status "raw lead" to status "inside sales qualified lead".

Then for the data warehouse designer, the question becomes:
1. When someone changes a lead status from x to y after a call, should that be a row in a fact table somewhere? (perhaps in the same row as the call)
If yes, then we are storing abstract business concepts, rather than just real world events, in a fact table.
2. And the bigger question is, should the data warehouse even be aware of terms like "leads" and "opportunities"?
If yes–
2. Should the fact table have columns for old_status and new_status?
3. Or, should status change be handled as a slowly changing dimension?

If business logic is stored directly in the data warehouse, a lot of business questions become easier to ask. (lead status conversion metrics, opportunity stage advancement velocity metrics, and etc) But it seems more brittle to changes, is more complex to implement, and perhaps pollute the "verbs are facts and nouns are dimensions" paradigm.

How should I approach this design, and what are the best practices and guiding principles here?

Max

Best Answer

My answer is a big YES - you should store business logic in data warehouse. That is one of the ideas of data warehouses in the first place. If you have tens of reports showing or filtering leads imagine if a rule to qualify someone as lead would change. Also, what if you have to access DWH data with different tools/systems - you'd need to replicate all the logic. Lastly, calculating this kind of labels can be time consuming and that's why you'd like to have everything precalculated in DWH for the reports to run fast.

To answer your first question - yes, you should go with SCD. There are couple apporaches and you should choose the one that suits best your requirements - it's all about how users want to analyze data (this is where the desing should start.