MySQL – Maintain History of Records but Using Most Recent

database-designMySQLquery

Let's say I'm developing a database for a transportation service which owns certain vehicles and hires drivers which are assigned to specific vehicles. The database comprises of the following tables:

vehicles

id | number | brand | model | color | date_registered

drivers

id | passport_number | first_name | last_name | dob | address | date_registered

vehicle_drivers

id | vehicle_id | driver_id | date_assigned

daily_vehicle_reports

id | vehicle_id | milage | remaining_fuel | problems_detected | observations | recommendations | report_date

Where daily_vechicle_reports records the state of the vechile at the end of each day.

At the end of each month, I would generate data on all reports for the given month. Now my problem is, given that the possibility that at any given day, the driver of any vehicle can be changed, the data on the previous drive remains in the database, and normally, to generate the data I would use a query similar to the following:

SELECT
    dvr.*,
    v.number,
    v.date_registered,
    d.passport_number,
    d.first_name,
    d.last_name,
    d.dob,
    d.address
FROM daily_vehicle_reports dvr
INNER JOIN vehicles v ON(v.id = dvr.vehicle_id)
INNER JOIN vehicle_drivers vd ON(vd.vehicle_id = v.id)
INNER JOIN drivers d ON(d.id = vd.driver_id)
WHERE MONTH(dvr.report_data) = 4
AND YEAR(dvr.report_data) = 2019

This would give me repeated data for the reports are multiple drivers would be found for the same vehicle. How can I modify the query so as to include ONLY the last driver assigned by that date (the date of the report)?

Best Answer

This is well known problem "Max by group" that can be solved by subselect:

SELECT
       dvr.*,
       v.number,
       v.date_registered,
       d.passport_number,
       d.first_name,
       d.last_name,
       d.dob,
       d.address
  FROM daily_vehicle_reports AS dvr
  JOIN vehicles              AS v  ON v.id = dvr.vehicle_id
  JOIN vehicle_drivers       AS vd ON vd.vehicle_id = v.id

  JOIN ( SELECT vehicle_id                            -- here is additional join with subselect
              , MAX(date_assigned) AS date_assigned   -- last assignment for each vehicle
           FROM vehicle_drivers
          GROUP BY vehicle_id
       ) AS vdm  ON vdm.vehicle_id = vd.vehicle_id    -- joined to the full table `vehicle_driver`
                AND vdm.date_assigned = vd.date_assigned -- to exclude NOT LAST rows

  JOIN drivers               AS d ON d.id = vd.driver_id
 WHERE MONTH(dvr.report_data) = 4      -- Too bad, can't be indexed
   AND  YEAR(dvr.report_data) = 2019   -- The same thing

To speedup the filtering by date you have to use the following approach:

WHERE dvr.report_data BETWEEN '2019-04-01' AND '2019-04-30'

This kind of date range filtering is covered by index (if exists) and way more faster