Sql-server – Data Model For Summarizing Student Info

database-designsql server

I work for a school district and one of the many things the admin wants to know is how many tardies a student has for a given time frame. I could certainly go retrieve the count from the attendance table, but wondered about summarizing the data and if that might be faster/better. So, I've conisdered a table that looks something like:

create table stu_summary (
id int identity primary key nonclustered not null,
stu_id char(10) not null primary key references student (stu_id),
group_name varchar(50) not null,
item_name varchar(50),
value varchar(100))

which would then allow me to have data that looked like:

1, 'abc112233', 'tardy-count', '2011-F-21-99383', 2

which would signify that this record shows that student abc112233 has 2 tardies for the academic year 2011 in fall for progress period 21 and class id 99383.

Is there a better way to store this kind of info? I'm hoping to keep the table flexible and not tie it down to the specific data columns. Am I headed in a bad direction? What have you done to store this kind of date span specific summaries?

Edit
There has been speculation as to the amount of data, number of tables in question here, so let me illuminate:

  • There are 70k rows of attendance recorded each day, 1.2m per month
  • Each student's tardy count will be displayed in an attendance window with a grid. This is important because a student normally has 6 classes and so their attendance will be loaded frequently throughout the day
  • The attendance table has a code column in it that determines what kind of attendance it is. The tardy code is one of those values.

Best Answer

Calculate those counts on the fly.

For most data sets of this nature, I would imagine calculating the count of tardiness events would be relatively cheap. How many rows do you expect there to be in the long run in your students or attendance tables? Probably on the order of tens of thousands at most. Contrast that to calculating an account balance from its transaction history in database for a large bank with billions of transactions. In this case I would consider persisting a summary of these aggregations somewhere.

If you wanted to pre-calculate this data anyway, I would summarize the count of tardiness events per student per year in an indexed view as suggested in the top answer here.

For example:

CREATE VIEW dbo.tardiness_summary
WITH SCHEMABINDING
AS
SELECT
     student_id
   , year
   , COUNT_BIG(*) AS tardy_count
FROM dbo.attendance
GROUP BY 
     student_id
   , year
;

CREATE UNIQUE CLUSTERED INDEX IX_tardiness_summary
ON dbo.tardiness_summary (
     student_id
   , year
;

This is not a flexible approach, however, as this view is now schema-bound to the base tables and thus any modifications to either the view or the table will require rebuilding the view. Indexed views also have many restrictions on how they may be created or queried. Their advantage is that they guarantee the summary table will stay in sync with its sources because the database engine is now doing this work for you.