MySQL – Handle Multiple Rows with Same Maximum Value

greatest-n-per-groupMySQL

When I run the following query,

SELECT engines, manufacturer, model, MAX(seats)
FROM   planes
GROUP BY engines;

I am getting the correct engines and seats results but not the correct manufacturer, model combination. In addition, there are multiple rows with the same maximum value for seats number which I need but just getting one result by engines/seats. I have viewed other Stack Exchange posts and elsewhere but cannot seem to find a good solution to fix the query. Any advice?

Best Answer

Yours appears to be a "greatest N per group" problem. What you can do is get the maximum seats per engine results:

SELECT
  engines,
  MAX(seats) AS max_seats
FROM
  planes
GROUP BY
  engines

and, using them as a derived table, join them back to the source to get the rows matching the maximums:

SELECT
  p.engines,
  p.manufacturer,
  p.model,
  es.max_seats
FROM
  planes AS p
  INNER JOIN
  (
    SELECT
      engines,
      MAX(seats) AS max_seats
    FROM
      planes
    GROUP BY
      engines
  ) AS es
  ON p.engines = es.engines
  AND p.seats = es.max_seats
;

Related Solutions

MySQL – Update Table Based on Last Data from Another Table with Grouping

The correct query is:

UPDATE
    lp_plates_backup AS t
    INNER JOIN  (
        SELECT
            plate_uid, brand, model, date_validated
        FROM
            lp_pictures_backup as parent
        WHERE
            brand <> '' AND
            date_validated = (
                 SELECT MAX(date_validated)
                 FROM lp_pictures_backup as t2
                 WHERE t2.plate_uid = parent.plate_uid
                 GROUP BY
                     plate_uid)
) AS m ON
    m.plate_uid = t.uid
SET
    t.brand = m.brand,
    t.model = m.model
WHERE
    t.brand <> m.brand
    OR
    t.model <> m.model;

Just some little explanations. You need INNER JOIN because you must update row of lp_plates_backup only if plate_uid exists into lp_plates_backup. ORDER BY is useless because you are selecting all rows, order is not important.

You need the max date_validated of rows grouped by plate_uid, select MAX(date_validate) must individuate only a row for plate_uid through the date_validated field. So you need to add group by to select max(..).

The query select plate_uid returns a row for plate_uid so you don't need to aggregate here. The condition of single row is already builded into select max subquery.

I hope you did not get confused by me :-)

Updated

The previous works well if date_validate is the pair (plate_uid, date_validated) is unique. If you have this kind of data:

| plate_uid | brand    | model    | date_validated      |
|         1 | Fiat     | Panda    | 2014-10-11 10:03:20 | 
|         1 | BMW      | 7-Series | 2014-10-11 10:03:20 |   <- changed data
|         1 | BMW      | 7-Series | 2014-07-28 19:14:02 |
|         1 | Mercedes | S-Class  | 2014-06-12 08:54:57 |   
|         a | Tesla    | Model S  | 2014-12-17 11:00:00 | 
|         a | BMW      | 3-Series | 2014-11-07 14:34:11 |

The following query returns the first two rows for plate_uid 1.

SELECT plate_uid, brand, model, date_validated
FROM lp_pictures_backup as parent
WHERE
      brand <> '' AND
      date_validated = (
                 SELECT MAX(date_validated)
                 FROM lp_pictures_backup as t2
                 WHERE t2.plate_uid = parent.plate_uid
                 GROUP BY plate_uid)

The update will choice the values from the first or the second row. I think of the following alternatives:

using some other lp_pictures_backup fields to choise between rows with the same date_validated.
enforcing a unique constraint on table. I.e. alter table lp_pictures_backup add unique index (plate_id, date_validated). Rejecting invalid data.
detecting valid lp_pictures_backup pair of (plate_uid, date_validated). Updating lp_plates_backup only with valid pairs, review invalid pairs and correct them.

Maybe there are more alternatives. I prefer enforcing contraint on data so to have better data. I expand 3th alternative. Just create a view to define what a valid pair (plate_uid, date_validated) is:

CREATE VIEW lp_pictures_backup_valid as
SELECT plate_uid, date_validated
FROM lp_pictures_backup as parent
WHERE date_validated = (
        select max(date_validated) 
        from lp_pictures_backup t2 
        where t2.plate_uid =  parent.plate_uid GROUP BY plate_uid
      )
group by plate_uid, date_validated
having count(*) = 1;                    <-- you can change this to make an invalid row

A valid pair (plate_uid, date_validated) is a pair with the max date_validated only if there are a unique date_validated value.

I rewrite the update statements to consider only valid pair:

UPDATE
    lp_plates_backup AS t
    INNER JOIN  (
        -- modification start
        SELECT p.plate_uid, p.brand, p.model, p.date_validated
        FROM lp_pictures_backup as p
             INNER JOIN lp_pictures_backup_valid valid 
                        ON p.plate_uid = valid.plate_uid and 
                           p.date_validated = valid.date_validated
        WHERE p.brand <> '')
        -- modification end
    ) 
    AS m ON m.plate_uid = t.uid
SET
    t.brand = m.brand,
    t.model = m.model
WHERE
    t.brand <> m.brand
    OR
    t.model <> m.model;

Hope this make sense.

Update: 2014-03-20

In the first case:

Blockquote 1. using some other lp_pictures_backup fields to choise between rows with the same date_validated.

I have assumed your date is like this:

|id| plate_uid | brand    | model    | date_validated      |
|4 |         1 | Fiat     | Panda    | 2014-10-11 10:03:20 | 
|3 |         1 | BMW      | 7-Series | 2014-10-11 10:03:20 |   
|2 |         1 | BMW      | 7-Series | 2014-07-28 19:14:02 |
|1 |         1 | Mercedes | S-Class  | 2014-06-12 08:54:57 |    
|2 |         a | Tesla    | Model S  | 2014-12-17 11:00:00 | 
|1 |         a | BMW      | 3-Series | 2014-11-07 14:34:11 |

You can try this:

UPDATE
    lp_plates_backup AS t
    INNER JOIN  (
        SELECT t1.plate_uid, t1.brand, t1.model, t1.date_validated
        FROM lp_pictures_backup as t1,
          (SELECT t2.plate_uid, MAX(id) as id,  MAX(date_validated) as dv
           FROM lp_pictures_backup as t2
           GROUP BY t2.plate_uid) as t3                 
        WHERE t1.brand <> '' AND
              t1.plate_uid = t3.plate_uid AND
              t1.date_validated = t3.dv AND
              t1.id = t3.id
) AS m ON
    m.plate_uid = t.uid
SET
    t.brand = m.brand,
    t.model = m.model
WHERE
    t.brand <> m.brand
    OR
    t.model <> m.model;

The fields used to choise the row to be updated are extracted by this part:

 ...
 (SELECT t2.plate_uid, MAX(id) as id,  MAX(date_validated) as dv
  FROM lp_pictures_backup as t2
  GROUP BY t2.plate_uid) as t3 
 ...

So it assumes a correlation between date_validated and id: at increasing dates corresponds to increasing id.

Hope it help.

MySQL – How to Select Max(Date) from Union Query

So it turns out that confluence uses a different structure to manage this data and now the discrepancies appear to be resolved.

Here is the query:

select
    User_Name,
    First_Name,
    Last_Name,
    max(Last_Login) as Last_Login
FROM
(
SELECT
    u.user_name as User_Name,
    u.first_name as First_Name,
    u.last_name Last_Name,
    ifnull(from_unixtime((cast(a.attribute_value AS UNSIGNED)/1000), '%Y-%m-%d'), '1970-01-01') AS Last_Login
FROM jira.cwd_user u
  LEFT JOIN jira.cwd_user_attributes a ON u.id = a.user_id AND a.attribute_name = 'login.lastLoginMillis'
  JOIN jira.cwd_directory d ON u.directory_id = d.id
where u.active = 1
UNION
 SELECT 
    u.user_name, 
    u.first_name as First_Name, 
    u.last_name as Last_Name, 
    date_format(l.successdate, '%Y-%m-%d') as Last_Login 
  FROM confluence.logininfo l 
  JOIN confluence.user_mapping m ON m.user_key = l.username 
  JOIN confluence.cwd_user u ON m.username = u.user_name 
  JOIN confluence.cwd_directory d ON u.directory_id = d.id and u.active = 'T' 
UNION
select
    u.user_name as User_Name,
    u.first_name as First_Name,
    u.last_name Last_Name,
    ifnull(from_unixtime((cast(sa.attribute_value AS UNSIGNED)/1000), '%Y-%m-%d'), '1970-01-01') AS Last_Login
FROM stash.cwd_user u
LEFT JOIN stash.cwd_user_attribute sa ON u.id = sa.user_id AND sa.attribute_name = 'login.lastLoginMillis'
JOIN stash.cwd_directory d ON u.directory_id = d.id
where u.is_active = 'T'
) as z
group by User_Name, First_Name, Last_Name
having (max(Last_Login)) <= date_sub(now(), interval 90 day)
order by User_Name asc, Last_Login desc

Best Answer

Related Solutions

MySQL – Update Table Based on Last Data from Another Table with Grouping

MySQL – How to Select Max(Date) from Union Query

Related Question