Sql-server – TOTAL column or VIEW that computes

database-designsql server

I've got a couple of tables to track volunteer workers at our non-profit.

One stores Volunteers and tracks information about a volunteer such as the person's name, a contact number, primary location and such.

Another table stores Tasks and tracks specific work done by the volunteers. It references the Volunteers table by ID to allow me to, say, pull back all of the tasks performed by Pete.

Tasks have number of hours so we can track how much work was done on tasks at a certain location, how many total hours were volunteered at that location, etc.

I'd like to keep track of the cumulative amount of volunteer time for each volunteer and this where a design decision needs to be made.

I could add a column to the Volunteers table called VolunteerHours and then add triggers to the Tasks table to call a stored procedure to recompute the volunteer hours when rows are added/updated/deleted on the Tasks table.

Alternatively, I've considered creating a view that would be used to retrieve the totals on demand.

create view VolunteerSummary
as 
    select v.VolunteerID, SUM(TaskVolunteerMinutes) as VolunteerTotal
    from Volunteer v
    join Tasks t on t.VolunteerID = v.VolunteerID
    group by v.VolunteerID

go

select * from VolunteerSummary

The view approach is attractive to me because I'm not storing a total column on the table, rather computing the total when I need it.

Assumptions I have made about these tables:

Fairly small (< 1000) number of volunteers
Most volunteers volunteer less than 12 times a calendar year (once a month)

Based on my assumptions and description, is there a good reason to go with the column/trigger/stored procedure route over the view?

Best Answer

I would go with the view for a few reasons:

It is obvious how the sum is generated with a view and it would always be correct.
You won't be able to fully trust the data of a updated column. ie life happens. (The importance of this depends on the consequences of bad data.)
This is essentially a premature optimization in a relatively lightly used system.

I don't think there is a good reason given the circumstances you described to go with the column/trigger/stored proc at this time.

Related Solutions

Sql-server – How accurate is the sys.partition.rows column

Books Online states that the rows field "indicates the approximate number of rows in this partition." I would therefore expect it to be close, but not 100% accurate, 100% of the time.

Michael Zilberstein reports an example of sys.partitions being wildly incorrect in For want of a nail. Not saying it is a common occurrence, but it is possible.

sys.dm_db_index_physical_stats contains a record_count field that appears to be more accurate, although be aware running the DMV may result in a REDO blocking issue if you run it on an instance hosting an AlwaysOn Readable Secondary Replica.

The explanation for the record_count field shows the following info:

Total number of records.

For an index, total number of records applies to the current level of the b-tree in the IN_ROW_DATA allocation unit.

For a heap, the total number of records in the IN_ROW_DATA allocation unit.

For a heap, the number of records returned from this function might not match the number of rows that are returned by running a SELECT COUNT(*) against the heap. This is because a row may contain multiple records. For example, under some update situations, a single heap row may have a forwarding record and a forwarded record as a result of the update operation. Also, most large LOB rows are split into multiple records in LOB_DATA storage. For LOB_DATA or ROW_OVERFLOW_DATA allocation units, the total number of records in the complete allocation unit.

See also Martin Smith's answer to a similar question on Stack Overflow.

Sql-server – Turning multiple fields in column to row

You're looking for a pivot. Either of these queries will work: SqlFiddle

/* case */

select
      CustomerId
    , FirstName =max(case when [Key]= 'FirstName' then Value end)
    , LastName  =max(case when [Key]= 'SecondName'  then Value end)
    , Age       =max(case when [Key]= 'Age'       then Value end)
    , Gender    =max(case when [Key]= 'Gender'    then Value end)
from Customers
group by CustomerId 


/* pivot */

select CustomerId, FirstName, SecondName, Age, Gender
  from (Select CustomerId, [Key], Value from Customers) c
    pivot ( max(Value)
      for [Key] in (FirstName, SecondName, Age, Gender)
    ) as p

Schema Setup for SqlFiddle:

    create table Customers (
    id int identity (1,1) not null primary key
  , CustomerID int not null
  , [Key] varchar(32) not null
  , Value varchar(32) 
  )

insert into Customers (CustomerId, [Key], Value) values
    (2,'FirstName','Tim')
  , (2,'SecondName','Skold')
  , (2,'Age','48')
  , (2,'Gender','Male')
  , (3,'FirstName','Sql')
  , (3,'SecondName','Zim')
  , (3,'Age','32')
  , (3,'Gender','Male')

Links

Best Answer

Related Solutions

Sql-server – How accurate is the sys.partition.rows column

Sql-server – Turning multiple fields in column to row

Related Question