Ms-access – Select ONLY Max(date) when there’s repeating dates

maxms accessms-access-2016query

Here is a sample of the data contained in my Tracking table:

tracking_id   date_action  action_id    staffing_id
------------- -----------  -----------  -----------
 1             2019/03/04  4             2
 2             2019/03/04  3             2
 7             2018/10/25  67            3
 4             2018/10/25  8             3
 5             2019/05/05  10            5
 6             2019/05/05  6             5
 8             2019/03/02  55            6

The result I want is the following:

  tracking_id   date_action  action_id    staffing_id
    ------------- -----------  -----------  -----------
     2             2019/03/04  3             2
     4             2018/10/25  8             3
     6             2019/05/05  6             5
     8             2019/03/02  55            6

I need to find the max(date_action_taken) group by staffing_id and join it with other tables to get the columns in the Action table using action_id and all of the columns in the Staffing table using the staffing_id.

I first try to get the the max date in the Tracking table for each staffing_id:

SELECT staffing_id, Max(Tracking.date_action_taken) AS MaxOfdate
FROM Tracking
GROUP BY staffing_id

When I join the query above back to the same table to get the other colmns of the Tracking table. It still returns the max(date_action) but it also returns the repeated ones. The query I used is below:

Select  tracking_id, t2.mxdate, t1.action_id, t1.staffing_id
FROM Tracking t1
inner join
(select  max(date_action) as mxdate, staffing_id
From Tracking
Group by staffing_id
) t2
on t1.staffing_id = t2.staffing_id and t1.date_action_taken = t2.mxdate

I even tried another query to get the max(tracking_id) instead using this query:

Select Tracking.*
From Tracking 
Where tracking_id IN
(Select Max(tracking_id) as t_id
From Tracking
Group by staffing_id
)

This works great until I realize that the max(tracking) is not necessarily the max(date_action).

I even used DISTINCT. And it still returns the repeated max(date_action).

Select  DISTINCT tracking_id, t2.mxdate, t1.action_id, t1.comment, t1.staffing_id
FROM Tracking t1
inner join
(select  DISTINCT max(date_action_taken) as mxdate, staffing_id
From Tracking
Group by staffing_id
) t2
on t1.staffing_id = t2.staffing_id and t1.date_action_taken = t2.mxdate

I am not sure what to do. Is there a way, I can first get the max(date_action) by staffing_id and if two of them are repeated to get the max(tracking_id) between the max(date_action).

I am thinking of adding another column to act as a "flag". A data type yes/no, column named (isTheLastAction) so on each insert in the tracking table. It find the previous isTheLastAction, put it at false, and the new insert to true within the same staffing_id that way, it doesn't depend on the date_action nor the tracking_id. I am just not sure how to do that.

Any thoughts on either helping me solve the max(date_action) or using the flag on insert?

Best Answer

Akina has already pointed out critical problems with the data... that there is no existing way to determine the proper ordering of the Tracking rows. It is tempting to just skip this question, but despite the problems, there are some concepts and SQL queries patterns which can help here.

  1. In a traditional relational database, the storage order of records should never be assumed. Although it may be natural for records to be stored in the order they are added, there is actually no guarantee of this. A database file can become segmented, then later compacted, etc., and the rows may not be reconstructed in the same order. Although Access might usually store the records in a particular order, it is a mistake to rely on any such insertion or storage order.

    • The example [Tracking] data implies a given order by showing [tracking_id] = 7 before [tracking_id] = 4, and comments further indicate that one was inserted before the other. This mistaken assumption is likely causing some of the confusion.
    • A related principle of SQL is that query rows should not be assumed to be in a particular order except when an ORDER BY clause is specified. Once again, it is a mistake to rely on the default order just because it usually matches the insertion order.
    • Specification of various constraints and indexes can be used to facilitate efficient ordering. For example, if one specifies a primary key on the table, it is likely that the rows will be stored by that primary key, but they could very well be stored in segmented groups/patterns to facilitate quick searches. It is still a mistake to expect an SQL query to always return the rows in primary-key order without an explicit GROUP BY clause.
  2. At least within Access, logical order can be implemented by using a long-integer Autonumber column. Access keeps track of used Autonumber values and it will consistently assign a unique number to new rows. In this case, it can be trusted that a higher number was inserted after a lower number. (Note: This behavior is not universally true, since some database systems by default will reuse numbers of deleted records.)

    • It is such common practice for tables to use an Autonumber column, that it was natural to assume that [tracking_id] was such a column. Most users will assume at first read that [tracking_id] = 7 comes AFTER [tracking_id] = 4. Honestly, the fact that they are not in such an order is perplexing, especially without some other field to indicate the insertion order.
    • Another alternative is to use a date field, a "timestamp" field. If the only reason for this field is to indicate insertion/creation order, then I suggest a Date/Time column with Default Value = Now(). (Since Access dates only have resolution down to a second, this is only valid if records will not be inserted faster than each second.)
    • The "flag" column as you described is discourage and should be unnecessary if one of the other above schema is used. Manually changing flag values upon insertion and edits can become a maintenance nightmare and without proper constraints (Validation Rules and/or unique indexes), it can lead to contradictory data.

For the following queries, I will assume that [tracking_id] is a unique Autonumber field and that it consistently represents insertion order so that a higher number is the "latest" record with the same [date_action].

I assume that it is possible for each [staffing_id] to have multiple [date_action] values. The example data shows only one date (albeit multiple rows) for each [staffing_id]. It is tempting to simplify the queries based only on the sample data, but that is a mistake which can return incorrect aggregate values. The existing two answers make this mistake.

The question already contains all of the various parts of a solution, but never combines them to produce a complete solution. The mistake is in assuming that the desired results can be achieved with only one aggregate subquery. In fact, there needs to be two subqueries which find the max date and the max tracking_id separately! (Honestly, the question text outlines the following three steps using words, but then fails to match the logic in the queries.)

  • Query 1: First get the maximum [date_action] for a given [staffing_id],
  • Query 2: then get the maximum [tracking_id] for the maximum date and id found in query 1. (You can't find the maximum [tracking_id] value separately or at the same time--in the same aggregate query--before knowing the maximum date.)
  • Query 3: Finally get the record details (other fields) by linking to the found maximum [tracking_id]. (Because the staffing_id and date_action have already been selected in the inner queries AND because tracking_id is considered a unique key, it is only necessary for final selection on that key value. In other words, it is redundant to include the other fields in the outermost ON clause.)

SELECT t1.*
FROM Tracking AS t1
  INNER JOIN (
    SELECT Max(t2.tracking_id) AS MaxOfTracking_id
    FROM Tracking AS t2
      INNER JOIN (
        SELECT t3.staffing_id, Max(t3.date_action) AS MaxOfdate
        FROM Tracking AS t3
        GROUP BY t3.staffing_id
      ) AS tad
      ON t2.staffing_id = tad.staffing_id AND t2.date_action = tad.MaxOfdate
    GROUP BY t2.staffing_id, t2.date_action
  ) AS tai
  ON t1.tracking_id = tai.MaxOfTracking_id
ORDER BY t1.staffing_id

Returns:

tracking_id   date_action  action_id  staffing_id
2             3/4/2019     3          2
7             10/25/2018   67         3
6             5/5/2019     6          5
8             3/2/2019     55         6