Situation
I need to create a pie chart that has to be able to be represented at the following levels: District, Region, and School.
At the student level it has to show the trend of the student over a set of periods: "I1", "G1", "I2", "G2", "I3", "G3", "I4", "G4"
The pie chart will show a number and pct of students with 1,2,3, or none risks factors. Risk factors being: Academics, Discipline, and Attendance. I have been given a formula to determine if a student is at risk at any of the categories above.
We have a Ralph Kimball based approach Data Warehouse.
I have a working query at the school level that I am able to plug the given formula through subselects and sums/counts that returns me as follows:
STUDENT_ID STUDENT_NAME PERIOD SCHOOL_YEAR ACD_RISK DISC_RISK ATT_RISK
1 Viriato I1 2010/2011 1 1 1
1 Viriato G1 2010/2011 1 0 0
2 Teodosio I1 2010/2011 0 1 0
.....................
The query that produces me the total number of students having 1,2,3 or 0 risk factors by doing a count sum on the output above works fine as far as performance is concerned. However when I run it at the region and district levels, performance is not acceptable.
Solution
I need to create an Accumulating Snapshot
Concerns
Ralph Kimball in "The Complete Guide to Dimensional Modeling" chapter 12 page 244 states on Accumulating Snapshot:
A single row represents the complete history of something.
Solution A: Prior to Reading Kimball
So should I have a job that produces a table with the resultset above. But then I would have 8 rows per student per year. I would not have a single row like Kimball suggests.
Solution B: After Reading Kimball
student_id,
name,
school_year,
period,
att_risk_I1,
adm_risk_I1,
disc_ris_I1,
att_risk_G1,
adm_risk_G1,
disc_ris_G1,
.
.
.
.
att_risk_G4,
adm_risk_G4,
dis_ris_G4
As you can see with solution B I would have one row per student per year, and I could generate a trend graph for that student all the way from elementary school to high school with 12 rows (Grades 1st through 12th). However I am not sure I like having ATT_RISK, DISC_RISK and ADM_RISK repeated 8 times.
Which approach would you take?
Best Answer
It's not really clear what is asked here but I think your solution is simplest when you have one row per student. That allows further aggregating later however you want to do it.
So this is a vote for b.