Firstly apologies if this is a duplicate, I am fairly new to SQL and so Im not sure what the correct terminology to use in my searches
So I have a database which records motor races, with the following simplified schema
race_table
==========
race_id PK
race_date timestamp
average_speed Decimal
max_speed Decimal
drivers
=======
driver_id PK
driver_name text
date_started timestamp
driver_races
============
driver_id FK
race_id FK
If each driver has ~1000 races spread over 2/3 years
How would I go about querying the overall % change (positive of negative) in their average race speed for a given date range, for example
% Change in first 6 months
Joe Smith - 5% increase
Andy James - 4% increase
% Change in first 12 months
Joe Smith - 8% increase
Lewis May - 6% increase
UPDATE: More detail on % Change
The % Change could be calculated using linear regression, (least-squares-fit would be suitable), the average change is effectivly the y-difference on a line-of-best-fit, where each point is a race, x is the race_date and y is the average_speed for that race.
Postgres's regr_slope will give the gradient of the line which is effectivly the same as the %change
SELECT regr_slope(r.average_speed, EXTRACT(EPOCH FROM r.race_date)) as slope
FROM race_table as r, driver_races as dr
WHERE dr.race_id = r.race_id
AND d.driver_id = 1
This gives the figure I want, but I now need to apply is against all users, sorted by 'slope'
Best Answer
Data model
You are saving avg. and max. speed per race, not per driver? Normally this would be something like:
This is also how you should provide your data model: as
CREATE
scripts ready to use for testing.numeric
anddecimal
are the same in Postgres.Query
Your query extended to all drivers:
Simple alternative
A simple alternative would be to match the average speed of the first half of a time period to the average of the second half:
The
CASE
expression has noELSE
branch, because the defaultNULL
serves just fine:avg()
ignoresNULL
values.-> SQLfiddle demo.