Designing an Accumulating Snapshot with Repeated Groups, need advise

data-warehousedatabase-design

Situation

I need to create a pie chart that has to be able to be represented at the following levels: District, Region, and School.

At the student level it has to show the trend of the student over a set of periods: "I1", "G1", "I2", "G2", "I3", "G3", "I4", "G4"

The pie chart will show a number and pct of students with 1,2,3, or none risks factors. Risk factors being: Academics, Discipline, and Attendance. I have been given a formula to determine if a student is at risk at any of the categories above.

We have a Ralph Kimball based approach Data Warehouse.

I have a working query at the school level that I am able to plug the given formula through subselects and sums/counts that returns me as follows:

STUDENT_ID  STUDENT_NAME  PERIOD SCHOOL_YEAR  ACD_RISK  DISC_RISK  ATT_RISK   
1           Viriato       I1     2010/2011    1         1          1  

1           Viriato       G1     2010/2011    1         0          0

2           Teodosio      I1     2010/2011    0          1         0

.....................

The query that produces me the total number of students having 1,2,3 or 0 risk factors by doing a count sum on the output above works fine as far as performance is concerned. However when I run it at the region and district levels, performance is not acceptable.

Solution

I need to create an Accumulating Snapshot

Concerns

Ralph Kimball in "The Complete Guide to Dimensional Modeling" chapter 12 page 244 states on Accumulating Snapshot:

A single row represents the complete history of something.

Solution A: Prior to Reading Kimball

So should I have a job that produces a table with the resultset above. But then I would have 8 rows per student per year. I would not have a single row like Kimball suggests.

Solution B: After Reading Kimball

student_id,

name, 

school_year, 

period, 

att_risk_I1, 

adm_risk_I1, 

disc_ris_I1, 

att_risk_G1, 

adm_risk_G1, 

disc_ris_G1,

.

.

.

.

att_risk_G4,

adm_risk_G4,

dis_ris_G4

As you can see with solution B I would have one row per student per year, and I could generate a trend graph for that student all the way from elementary school to high school with 12 rows (Grades 1st through 12th). However I am not sure I like having ATT_RISK, DISC_RISK and ADM_RISK repeated 8 times.

Which approach would you take?

Best Answer

It's not really clear what is asked here but I think your solution is simplest when you have one row per student. That allows further aggregating later however you want to do it.

So this is a vote for b.