Sql-server – Datawarehouse – Dimension, Fact, or Degenerative dimension

data-warehousedimensionolapsql-server-2008

I am building a warehouse for a billing system that bills a number of services/utilities on the same invoice, as such I am having real trouble conforming to a recognised design method for the warehouse. I think I still very much have an OLTP hat on.

As an example a single invoice could be produced containing charges for electricity, gas and mobile phones.

All three components of the bill, themselves have two components, fixed charges, and usage based charges. Even simplified to just a mobile phone it still means to store the level of detail required for the business I need 2 fact tables:

FactMobileFixedCharge
FactMobileCalls

Each of these go onto a single invoice, in addition to this there are transactions at an invoice level too, so I now need a further fact table:

FactInvoiceTransaction

The basic schema for this simple example is as follows:

enter image description here

All the reading I have done approaches the InvoiceLineItem example to resolve the issue of an invoice appearing like a fact and a dimension, where the facts of the invoice are essentially propogated to the child table, causing some redundancy but conforming to standard practises.

In the production billing system itself there are 15 tables that contain items that appear on an invoice, a summary table that contains invoice level charges and a summary of the charges in the other 15 tables by category, and the invoice table itself.

I don't feel that my scenario matches this though, because the invoice has a number of different things that also need to be reported on (Invoicing address, marketing material on invoice, payment method, invoice format, email, online or paper billing, balance brought forward, payments since the last invoice etc). While this is acceptable redundancy if I had one or two tables it seems like a lot for 15 fact tables.

In addition to being a lot of redundancy when there are multiple tables, I would also still need to link FactMobileCall and FactMobileCharge to get records for the same invoice, and I don't beleive a degenerate dimension is a good way to link the two.

Basically this all points towards Invoice being a dimension, especially as there are no measurements on an invoice level, however it still feels like it should be a fact as there will only ever be new records on each import.

Which leads to my questions:

  1. Am I correct in thinking that an Invoice in this scenario should be considered a fact?
  2. If so is it acceptable to have further facts referencing this fact, or is there a better way to approach this?
  3. I can't believe I am the only person who has encountered a situation where the InvoiceLineItem examples don't really seem to fit. Is there a recognised way of dealing with this?

For further information, my warehouse schema currently looks likes this (where I have conceded to Invoice being a fact like dimension):

enter image description here

Best Answer

This is how I would approach it myself. Remember to think of your Facts as "actions" or "verbs" and your dimensions as the the descriptors of your facts.

So your invoice and invoice line items are both facts. One approach to dealing with this is what you mentioned, making the fact table to the granularity of the invoice level line item. This will lead to redundancy of certain things like you mentioned. This if fine for things as an billing address dimension and certain invoice level facts such as a tax could even be broken down to the line level item and it could then aggregate back up to the invoice level amount.

This approach can break down if you have invoice level facts that don't break down nicely. For example, are customers allowed to have more than one payment method? Could there be more than one type of marketing material attached to the invoice? Does the marketing material correlate to specific line items?

Asking these questions can definitely lead you to creating multiple facts such as FactInvoiceLineItem, FactInvoice, FactPayments, etc. that you end up tying together with something like invoice number. Doing this will give you more flexibility to analyze specific facts better.

If doing this approach and you are building cubes off the fact tables, I would recommend using views to load your cubes. This way you could have a high level view for the invoice that joined your various fact tables and had measures on there such as number of line items, number of payments, etc.

All that to say, I believe going that route is much better than splitting out your fact tables that have the same granularity.

Feel free to respond with more details if you need more help hashing it out.