Is dimensional modelling useful for modelling tables of calculations

data-warehousedatabase-design

I have a set of tables that look a bit like this:

Key 1 | Key 2 | Key 3 | Var 1 | Var 2 | Var 3 | ... | Var 200
-------------------------------------------------------------
   X  |   Y   |   Z   |   2   |   3   |   5   | ... |    8

The first three columns form a compound key. The remaining columns are populated by a series of calculations such as:

Var 3 = Var 1 + Var 2
Var 4 = Var 3 + Var 2
Var 5 = Var 3 + Var 4
....

(This is a huge simplification of the formulae. But, in essence, there is a chain of calculations that builds on previously calculated items.)

So, when considering how to build a dimensional model, I can identify the components of the key as a dimension (or two), and the variables seem to be measures, but I am struggling to identify what the fact is. I wonder if there is a key insight into the dimensional modelling process that I'm missing, or whether it's just inappropriate for my situation.

EDIT: I am not certain what I would achieve by dimensional modelling either! The context is an existing application which processes 3NF tables, does calculations, and spits out 3NF tables. It is proposed that these tables are migrated into a data warehouse which is being built using the dimensional modelling methodology. I am struggling to understand how this might be of benefit, since the calculations are abstract mathematical and statistical computations, while dimensional modelling seems to be oriented around identifying the "business process" that gives rise to facts.

Best Answer

If you could meaningfully slice the data by key1, key2 or key3 then you could make dimensions out of them. In that case you would have separate dimension tables that had all the values of each key, and any relevant attributes (even just a human-readable description of what the value represents) or groupings (roll-ups of the key values).

Each dimension table would have the key value and its attributes, and would link to the fact table on its key.

If your values (Var X) are additive across rows (i.e. all the values added together for Key1='Foo' mean something) then you can use the keys to slice the data.

I don't think that relationships between the Var columns are really relevant to dimensional modelling unless you want to unpivot them and indentify them by a 'variable type' dimension rather than separate columns

Beyond that, the question is a bit vague to really get a clear view of what you are trying to achieve and why you want to make a dimensional model of your data in the first place. Maybe if you can clarify your goals a bit we can give you a more meaningful answer.

EDIT: Dimensional modelling is just identifying the axes by which you want to slice data in aggregate. Sometimes a dimension might be very simple - just a code. Sometimes it may be complex with lots of sub-attributes - such as an insurance policy. A dimension lets you slice data in aggregate, either by its key or by rolling up the data by some attribute of the dimension.

If the VarX columns are additive between rows - i.e it makes sense to aggregate (for example) var2 for two or more rows - then a dimensional structure might make sense for reporting. One key point of a dimensional model is that it tends to facilitate efficient queries in aggregate, and plays nicely with OLAP tools.

So, if I had values 'Foo', 'Bar', 'Wibble', 'Blarg' for key1 and values 'A', 'B', 'C' for key2 then I might have a few rows that looked a bit like:

Key1      Key2       Var1       [. . .]
--------- ---------- ---------- ----------
Foo       A          10         . . .
Bar       A          20         . . .
Foo       B          50         . . .

If I wanted to see the total of Var1 for all Key2='A' then the reporting tool would allow me to slice on Key2 and add up all the rows. If I wanted to see the total of Var1 for Key1='Foo' then I could do something similar. In this instance the total of Var1 for Key2 = 'A' would be 30.

If this is the sort of thing you want to do the for data then a dimensional model could help, and would allow you to put up a statistical reporting facility with an off-the-shelf reporting tool such as Report Builder.

A dimension table for Key1 might look like

Key1    Key1Desc                 Key1Grouping
------- ------------------------ -------------
Foo     Description of 'Foo'     Group A
Bar     Description of 'Bar'     Group A
Wibble  Description of 'Wibble'  Group A
Blarg   Description of 'Blarg'   Group B

By joining this dimension table to the fact table you can then roll up by Key1Grouping and see stats for (say) 'Group A.' You can calculate ratios of Vars after the base additive measures have been aggregated.

Again - I'm still a bit unsure of quite what you want to know.

Related Solutions

Database Design – Hierarchy for Dimensional Model

My take on this is that you may perhaps want to try using a parent-child dimension, so the model will allow for a theoretically infinited amount of levels. That way, you can connect facts to each level of the hierarchy.

Here's an example:

CREATE TABLE dimGeography (
    id
    parent_id
    geographyType
    geographyCode
    geographyName
    PRIMARY KEY (id)
);

CREATE TABLE factDemographics (
    date
    geography_id
    monthlyIncome
    workingPopulationAge
    numberOfPeople
    PRIMARY KEY (date, geography_id)
    FOREIGN KEY (geography_id) REFERENCES dimGeography (id)
);

The advantages of this approach are:

you're free to add or remove levels as you like over time without changing the data model
you can connect facts (demographics) to each level
no duplicate measure columns, i.e. just one column called "number of people"
future-proof: some countries or regions may use different types of levels - for instance, not all countries use "state".

The disadvantages that I can think of are:

you may have to redesign your existing (?) dimension or fact table
your BI tool may not be able to handle parent-child dimensions
more complex SQL queries when aggregating the facts on higher levels

I haven't the slightest idea if this is the "correct" solution according to Kimball, Inmon, etc. Instead, I would urge you to see if it applies to your particular needs. ;)

Dimensional Model for inheritance in dimensions

While in a traditional OLTP system you may need to be concerned about normalizing and inheritance, in a typical data warehouse system you will want to denormalize to accelerate query responses.

In this case, I would design an Employee_Dimension table which had all relevant columns about an employee which you might want to query against, along with an additional column which I might call employee_role, which I would fill in with entries like Driver, Manager, etc. and then have another column which had either the truck_number filled for employees who were a driver, or leave the column as NULL when they aren't drivers. Of course, you can add even further columns to include information for other employee roles.

Long story short, if you're building a typical star schema, you can achieve faster query results by using heavy denormalization, at the cost of storage space of course.

Best Answer

Related Solutions

Database Design – Hierarchy for Dimensional Model

Dimensional Model for inheritance in dimensions

Related Question