Reporting Table Design

data-warehousenormalizationreporting

I'm writing a system that allows users to store values against each week in October and November and each day in December – starting from the first Monday. I'm holding the raw data entered by the users in a transactional set of tables (3NF I believe). Administrators will be pulling regular reports of that data. As a result, each time the user makes a save I want to extract the data into a denormalised MI table, to make the reporting faster, and help reduce locking on the transactional tables.

Bearing in mind that different years have different numbers of weeks in Oct and Nov, and different number of days in Dec (because we start from first Monday), what is a good design for the MI table? So, something like:

OctW1
OctW2
..
NovW4
NovW5
DecMon1
DecTues2
...
DecMon5
DecTue5  

I've considered:

  • creating a separate table for each year (tblMI_2012, tblMI_2013 etc), so the schema for each year is only defined once, and all the fields will be populated on each row.
  • creating a column for each possible value (so 31 fields for all the December dates) and only populating them when saving values for the respective calendar year.

Which pattern is the best?

Also, should I be also saving the date with the value (effectively doubling the column count)

OctW1
OctW1Date
OctW2
OctW2Date
etc

Apologies if this is really obvious, my first foray into creating an reporting table.

Best Answer

I would do something like this:

period_start_on date
period_duration interval
position        int -- position of this row in the year,
                    -- e.g. 1 for first Monday of Oct,
                    -- 2 for 2nd Monday, etc.
values ...
primary key(period_start_on, period_duration)
check(position >= 1 and position < 40)

Then if you want December only, you use WHERE period_duration = '1d' and period_start_on >= '2012-12-01' AND period_start_on < '2013-01-01'. If you want to compare two years, you JOIN using position:

SELECT y1.*, y2.*
FROM   the_table y1
  JOIN the_table y2 USING (position)
WHERE y1.period_start_on >= '2011-01-01' AND y1.period_start_on < '2012-01-01'
  AND y2.period_start_on >= '2012-01-01' AND y2.period_start_on < '2013-01-01'