I will be populating a Miles Per Gallon (MPG) table. It's coming from an odometer source.
It's currently set up as so:
id (primary_key)
, truck_num
, start_date
, end_date
, start_miles
, end_miles
, start_fuel
, end_fuel
, miles
, gals
, mpg
There seems to be some redundancy. The miles
is (end_miles - start_miles)
, ditto for gals
.
Should we have those miles
and gals
columns precalculated and stored in the database? It would definitely make querying easier, but at the expense of space. Same question for having the mpg
calculated. A computed column would slow things down, no?
What indexes would work best? There's about 3,000 trucks (records) inserted in a batch every week.
I'm using SQL Server 2008 R2.
Edit: A sample query that would I'd be using
-- find average mpg for since ytd
select m.truck_num, avg(mpg)
from mpg m
join truck t on t.truck_num = m.truck_num
where start_date >= @begin_of_year and end_date <= @today
group by truck_num
Best Answer
CHECK
constraints.Here is a pseudo-schema definition:
As for indexing the table, here's what I think:
start_date ASC
. This will satisfy yourWHERE
clause. You are always inserting data by increasingstart_date
, meaning your writes will always be sequential under this clustering scheme. You are also always querying bystart_date
so you satisfy your biggest query pattern as well. (3,000 inserts per week is nothing. Because you have such a low volume of inserts, you could even cluster onstart_date ASC, end_date ASC
.)truck_num
andINCLUDE mpg
. This should satisfy yourSELECT
,JOIN
, andGROUP BY
clauses. If you want toORDER BY mpg
, then makempg
part of the index key aftertruck_num
instead of justINCLUDE
-ing it.When you're done, test your setup as follows: