Sql-server – Best way to design this mileage table

database-designschemasql-server-2008

I will be populating a Miles Per Gallon (MPG) table. It's coming from an odometer source.

It's currently set up as so:

  id               (primary_key)
, truck_num
, start_date
, end_date
, start_miles
, end_miles
, start_fuel
, end_fuel
, miles
, gals
, mpg

There seems to be some redundancy. The miles is (end_miles - start_miles), ditto for gals.

Should we have those miles and gals columns precalculated and stored in the database? It would definitely make querying easier, but at the expense of space. Same question for having the mpg calculated. A computed column would slow things down, no?

What indexes would work best? There's about 3,000 trucks (records) inserted in a batch every week.

I'm using SQL Server 2008 R2.

Edit: A sample query that would I'd be using

-- find average mpg for since ytd
select m.truck_num, avg(mpg)
from mpg m
join truck t on t.truck_num = m.truck_num
where start_date >= @begin_of_year and end_date <= @today
group by truck_num

Best Answer

  • Computed columns are your friend. Use them to capture simple calculations you're going to do anyway, and to guarantee that the computed values are correct.
  • Persist the results if you want to index or filter on them; don't persist them if you just need to pull the value once in a while.
  • Capture all data constraints using CHECK constraints.

Here is a pseudo-schema definition:

  id               PRIMARY KEY
, truck_num

, start_date
, end_date         CHECK (end_date > start_date)

, start_miles      CHECK (start_miles >= 0)
, end_miles        CHECK (end_miles > start_miles)

                   -- what if they refill the tank?
, start_gals       CHECK (start_gals >= 0)
, end_gals         CHECK (end_gals < start_gals AND end_gals >= 0)

-- all these should be computed
, miles = end_miles - start_miles
, gals = start_gals - end_gals
, mpg = miles/gals

As for indexing the table, here's what I think:

  • Cluster on start_date ASC. This will satisfy your WHERE clause. You are always inserting data by increasing start_date, meaning your writes will always be sequential under this clustering scheme. You are also always querying by start_date so you satisfy your biggest query pattern as well. (3,000 inserts per week is nothing. Because you have such a low volume of inserts, you could even cluster on start_date ASC, end_date ASC.)
  • Create a non-clustered index on truck_num and INCLUDE mpg. This should satisfy your SELECT, JOIN, and GROUP BY clauses. If you want to ORDER BY mpg, then make mpg part of the index key after truck_num instead of just INCLUDE-ing it.

When you're done, test your setup as follows:

  1. Create a test table
  2. Pump it full of test data
  3. Create the indexes
  4. Update statistics
  5. Run your most common queries
  6. Check their plans and run times