Here is a sample of the data contained in my Tracking
table:
tracking_id date_action action_id staffing_id
------------- ----------- ----------- -----------
1 2019/03/04 4 2
2 2019/03/04 3 2
7 2018/10/25 67 3
4 2018/10/25 8 3
5 2019/05/05 10 5
6 2019/05/05 6 5
8 2019/03/02 55 6
The result I want is the following:
tracking_id date_action action_id staffing_id
------------- ----------- ----------- -----------
2 2019/03/04 3 2
4 2018/10/25 8 3
6 2019/05/05 6 5
8 2019/03/02 55 6
I need to find the max(date_action_taken)
group by staffing_id
and join it with other tables to get the columns in the Action table using action_id and all of the columns in the Staffing
table using the staffing_id
.
I first try to get the the max date in the Tracking
table for each staffing_id
:
SELECT staffing_id, Max(Tracking.date_action_taken) AS MaxOfdate
FROM Tracking
GROUP BY staffing_id
When I join the query above back to the same table to get the other colmns of the Tracking table. It still returns the max(date_action)
but it also returns the repeated ones. The query I used is below:
Select tracking_id, t2.mxdate, t1.action_id, t1.staffing_id
FROM Tracking t1
inner join
(select max(date_action) as mxdate, staffing_id
From Tracking
Group by staffing_id
) t2
on t1.staffing_id = t2.staffing_id and t1.date_action_taken = t2.mxdate
I even tried another query to get the max(tracking_id)
instead using this query:
Select Tracking.*
From Tracking
Where tracking_id IN
(Select Max(tracking_id) as t_id
From Tracking
Group by staffing_id
)
This works great until I realize that the max(tracking)
is not necessarily the max(date_action)
.
I even used DISTINCT. And it still returns the repeated max(date_action)
.
Select DISTINCT tracking_id, t2.mxdate, t1.action_id, t1.comment, t1.staffing_id
FROM Tracking t1
inner join
(select DISTINCT max(date_action_taken) as mxdate, staffing_id
From Tracking
Group by staffing_id
) t2
on t1.staffing_id = t2.staffing_id and t1.date_action_taken = t2.mxdate
I am not sure what to do. Is there a way, I can first get the max(date_action)
by staffing_id and if two of them are repeated to get the max(tracking_id)
between the max(date_action)
.
I am thinking of adding another column to act as a "flag". A data type yes/no
, column named (isTheLastAction)
so on each insert in the tracking table. It find the previous isTheLastAction
, put it at false, and the new insert to true within the same staffing_id
that way, it doesn't depend on the date_action
nor the tracking_id
. I am just not sure how to do that.
Any thoughts on either helping me solve the max(date_action)
or using the flag on insert?
Best Answer
Akina has already pointed out critical problems with the data... that there is no existing way to determine the proper ordering of the Tracking rows. It is tempting to just skip this question, but despite the problems, there are some concepts and SQL queries patterns which can help here.
In a traditional relational database, the storage order of records should never be assumed. Although it may be natural for records to be stored in the order they are added, there is actually no guarantee of this. A database file can become segmented, then later compacted, etc., and the rows may not be reconstructed in the same order. Although Access might usually store the records in a particular order, it is a mistake to rely on any such insertion or storage order.
[Tracking]
data implies a given order by showing[tracking_id] = 7
before[tracking_id] = 4
, and comments further indicate that one was inserted before the other. This mistaken assumption is likely causing some of the confusion.At least within Access, logical order can be implemented by using a long-integer Autonumber column. Access keeps track of used Autonumber values and it will consistently assign a unique number to new rows. In this case, it can be trusted that a higher number was inserted after a lower number. (Note: This behavior is not universally true, since some database systems by default will reuse numbers of deleted records.)
[tracking_id]
was such a column. Most users will assume at first read that[tracking_id] = 7
comes AFTER[tracking_id] = 4
. Honestly, the fact that they are not in such an order is perplexing, especially without some other field to indicate the insertion order.Default Value = Now()
. (Since Access dates only have resolution down to a second, this is only valid if records will not be inserted faster than each second.)For the following queries, I will assume that
[tracking_id]
is a unique Autonumber field and that it consistently represents insertion order so that a higher number is the "latest" record with the same[date_action]
.I assume that it is possible for each
[staffing_id]
to have multiple[date_action]
values. The example data shows only one date (albeit multiple rows) for each[staffing_id]
. It is tempting to simplify the queries based only on the sample data, but that is a mistake which can return incorrect aggregate values. The existing two answers make this mistake.The question already contains all of the various parts of a solution, but never combines them to produce a complete solution. The mistake is in assuming that the desired results can be achieved with only one aggregate subquery. In fact, there needs to be two subqueries which find the max date and the max tracking_id separately! (Honestly, the question text outlines the following three steps using words, but then fails to match the logic in the queries.)
[date_action]
for a given[staffing_id]
,[tracking_id]
for the maximum date and id found in query 1. (You can't find the maximum[tracking_id]
value separately or at the same time--in the same aggregate query--before knowing the maximum date.)[tracking_id]
. (Because thestaffing_id
anddate_action
have already been selected in the inner queries AND becausetracking_id
is considered a unique key, it is only necessary for final selection on that key value. In other words, it is redundant to include the other fields in the outermost ON clause.)Returns: