Mysql – Getting distinct list of records based on the MAX of a column

MySQL

I have a status table (tblTestActionStatus) which has three columns

ID_TestAction: references some test action
ID_Status: a look up table reference for the variety of possible statuses for the action
StatusDateTime: a datetime field which logs the exact time a status update was made for the test action.

So while a test action is happening, periodic status updates are made.

I am interested to know the current status which would be the last status update of a test action as of the moment I run the query.
For example, test action # 100 has gotten 4 status changes so far. The first was 1, then 2, then 3, and most recently 4. And test action # 101 has gotten 3 status updates so far.

So I would like to write a query that returns for me all the columns in the table but for only the most recent StatusDateTimes.

I attached a pic which shows the table contents and the rows I would like to see coming back from the query highlighted.

enter image description here

Best Answer

^{This answer was originally posted by the OP as an edit to the question; it is now reposted as a Community Wiki answer.}

I was able to find an answer with some more searching in the archive.

https://stackoverflow.com/questions/1049702/create-a-sql-query-to-retrieve-most-recent-records

So the SQL I created from that post which works in my case is as follows...

SELECT tblTestActionStatus.ID_TestAction, StatusDateTime, ID_Status 
FROM tblTestActionStatus 
INNER JOIN
    ( 
        Select MAX(StatusDateTime) as LatestDate, ID_TestAction
        FROM tblTestActionStatus 
        Group By ID_TestAction
    ) SubMax 
on tblTestActionStatus.StatusDateTime = SubMax.LatestDate
and tblTestActionStatus.ID_TestAction = SubMax.ID_TestAction

Related Solutions

Mysql – What’s the most efficient way to batch UPDATE queries in MySQL

Since you're using InnoDB tables, the most obvious optimization would be to group multiple UPDATEs into a transaction.

With InnoDB, being a transactional engine, you pay not just for the UPDATE itself, but also for all the transactional overhead: managing the transaction buffer, transaction log, flushing the log to disk.

If you are logically comfortable with the idea, try and group 100-1000 UPDATEs at a time, each time wrapped like this:

START TRANSACTION;
UPDATE ...
UPDATE ...
UPDATE ...
UPDATE ...
COMMIT;

Possible downsides:

One error will collapse the entire transaction (but would be easily fixed in code)
You might wait for a long time to accumulate your 1000 UPDATEs, so you might also want to have some timeout
More complexity on your application code.

Mysql – Understanding JOINs and why the syntax works the way it does

The answer is in the parentheses:

SET price=(SELECT price FROM new WHERE new.id=original.id)

This is a scalar subquery. It only ever returns one value (or no value, in which case it effectively returns NULL). As @ypercube mentioned in comments, if there were ever more than one matching id in new then this expression would throw an error, since a scalar subquery can't and won't deal with more than one possible value.

This is also (by at least some definitions) a correlated subquery, containing a reference to a table (original) that is not mentioned in the FROM clause of the subquery.

The subquery is, essentially, executed once for each row in original in order to find the needed value in new, and that's how the rows don't get mixed up -- this expression is evaluated for each row in the original table.

At least, conceptually, that's how it happens. The optimizer is free to decide -- within the limits of its design -- if there's a better way to perform your query than the way you've written it, as long as the chosen approach will still provide the exact same result (with the exception of the ordering of rows, which is by definition, undefined, unless you explicitly ORDER BY in a SELECT statement). MySQL 5.6 brought some changes in subquery optimization that were largely improvements.

Although sometimes they are absolutely essential, subqueries can be a red flag that a query's logic could be improved, to make things easier on the optimizer and get the work done faster. This is one of those cases.

A better, arguably clearer, and perhaps significantly better-performing way to write the example query would be this:

UPDATE original o
  JOIN new n ON n.id = o.id
   SET o.price = n.price;

There's no need to include WHERE n.id = o.id because the join will not only join the rows on that criterion, it will also exclude all rows that can't be joined. The caveat here is that if id in the new table isn't unique, for a different reason than the error the original query would throw. In this case, the result isn't deterministic, because you're not able to tell the server which of the more-than-one potentially matching row, so it will pick one, and you can't choose which one. But if id is unique in the new table, there's no problem.

MySQL has historically been not the best at handling WHERE ... IN ( ... ) in some cases, and this rewrite also removes the need for that construct.

Best Answer

Related Solutions

Mysql – What’s the most efficient way to batch UPDATE queries in MySQL

Mysql – Understanding JOINs and why the syntax works the way it does

Related Question