Let's say I'm developing a database for a transportation service which owns certain vehicles and hires drivers which are assigned to specific vehicles. The database comprises of the following tables:
vehicles
id | number | brand | model | color | date_registered
drivers
id | passport_number | first_name | last_name | dob | address | date_registered
vehicle_drivers
id | vehicle_id | driver_id | date_assigned
daily_vehicle_reports
id | vehicle_id | milage | remaining_fuel | problems_detected | observations | recommendations | report_date
Where daily_vechicle_reports records the state of the vechile at the end of each day.
At the end of each month, I would generate data on all reports for the given month. Now my problem is, given that the possibility that at any given day, the driver of any vehicle can be changed, the data on the previous drive remains in the database, and normally, to generate the data I would use a query similar to the following:
SELECT
dvr.*,
v.number,
v.date_registered,
d.passport_number,
d.first_name,
d.last_name,
d.dob,
d.address
FROM daily_vehicle_reports dvr
INNER JOIN vehicles v ON(v.id = dvr.vehicle_id)
INNER JOIN vehicle_drivers vd ON(vd.vehicle_id = v.id)
INNER JOIN drivers d ON(d.id = vd.driver_id)
WHERE MONTH(dvr.report_data) = 4
AND YEAR(dvr.report_data) = 2019
This would give me repeated data for the reports are multiple drivers would be found for the same vehicle. How can I modify the query so as to include ONLY the last driver assigned by that date (the date of the report)?
Best Answer
This is well known problem "Max by group" that can be solved by subselect:
To speedup the filtering by date you have to use the following approach:
This kind of date range filtering is covered by index (if exists) and way more faster