I'm working with a food purchasing / invoice system in MS Access 2013 and am trying to create an SQL query that will return the most recent purchase price for each individual food item.
Here is a diagram of the tables I'm working with:
My understanding of SQL is very basic, and I tried the following (incorrect) query, in the hopes that it would return only one record per item (because of the DISTINCT
operator) and that it would only return the most recent purchase (since I did ORDER BY [Invoice Date] DESC
)
SELECT DISTINCT ([Food items].Item),
[Food items].Item, [Food purchase data].[Price per unit], [Food purchase data].[Purchase unit], Invoices.[Invoice Date]
FROM Invoices
INNER JOIN ([Food items]
INNER JOIN [Food purchase data]
ON [Food items].ID = [Food purchase data].[Food item ID])
ON Invoices.ID = [Food purchase data].[Invoice ID]
ORDER BY Invoices.[Invoice Date] DESC;
However the query above simply returns all of the food purchases (i.e. multiple records for each record in [Food items]
), with the results sorted descending by date. Can someone explain to me what I am misunderstanding about the DISTINCT
operator? That is, why is it not returning only one record for each item in [Food items]
?
And more to the point – what is the simplest way for me to just pull the most recent food purchase data for each individual food item, given the table structure shown above? I don't really care about efficiency as much as simplicity (the database I'm working with is rather small – it will be years before it's even in the tens of thousands of records range). I care more about the query being understandable for someone with little knowledge of SQL.
UPDATE:
So I tried, both of the answers suggested below, and neither of them work (they just throw up syntax errors).
Based on the suggestions below, and further reading online, I wrote the following new query, using the aggregate function max()
and a GROUP BY
clause:
SELECT [Food purchase data].[Food item ID], [Food purchase data].[Price per unit], max(Invoices.[Invoice Date]) AS MostRecentInvoiceDate
FROM [Food purchase data], Invoices
GROUP BY [Food purchase data].[Food item ID], [Food purchase data].[Price per unit];
But I am still having the same problem: that is, I'm still seeing more than one result for each food item. Can anyone explain why this query is not only returning the most recent purchase for each food item?
UPDATE 2 (SOLVED!):
None of the answers below quite worked out but based on some heavy modification of Vladimir's answer below, I was able to create the following queries, which appear to be giving the correct results.
First, I created this view and named it "LatestInvoices":
SELECT InvoicesMaxDate.ItemID, InvoicesMaxDate.MaxDate, InvoicesMaxDate.MaxID
FROM [Food purchase data], Invoices, (SELECT [Food purchase data].[Food item ID] AS ItemID, MAX(Invoices.[Invoice Date]) AS MaxDate, MAX(Invoices.[Invoice ID]) AS MaxID
FROM [Food purchase data], Invoices
WHERE Invoices.[Invoice ID] = [Food purchase data].[Invoice ID]
GROUP BY [Food purchase data].[Food item ID]
) AS InvoicesMaxDate
WHERE InvoicesMaxDate.MaxID = [Food purchase data].[Invoice ID] AND
InvoicesMaxDate.ItemID = [Food purchase data].[Food item ID] AND
InvoicesMaxDate.MaxDate = Invoices.[Invoice Date]
GROUP BY InvoicesMaxDate.ItemID, InvoicesMaxDate.MaxDate, InvoicesMaxDate.MaxID
Then I wrote another query to pull in the fields I needed:
SELECT [Food items].ID AS FoodItemID, [Food items].Item AS FoodItem, [Food purchase data].[Price], [Food purchase data].[Price per unit], [Food purchase data].[Purchase unit], LatestInvoices.MaxDate as InvoiceDate
FROM [Food items], [Food purchase data], LatestInvoices
WHERE LatestInvoices.[MaxID] = [Food purchase data].[Invoice ID] AND
LatestInvoices.ItemID = [Food purchase data].[Food item ID] AND
LatestInvoices.ItemID = [Food items].ID
ORDER BY [Food items].Item;
Thanks to all of you who took the time to help me with this!
Best Answer
MS Access is rather limited.
I assume that it is possible to have more than one invoice for the same date. In this case I'll pick an invoice with the highest ID.
At first we'll find maximum Invoice Date for each Food Item.
Since it is possible that there are several invoices for the found max date we'll pick one invoice with the max ID per Item
Based on the MS Access syntax of nested joins and using this example from the docs:
Let's try to put it together:
Now we have both ItemID and ID of the last Invoice for that Item. Join this to original tables to fetch other details (columns).
In practice I'd create a view for the first query with a single join. Then I'd create a second view that joins the first view with the tables, then the third view and so on, to avoid the nested joins or minimize them. Overall query would be easier to read.
Edit to clarify what I mean based on your final solution that you put into the question.
One last attempt to convey my message.
This is what you wrote based on my suggestions above:
This is what I meant:
Do you see the difference?
The
InvoicesMaxDate
returns MAXInvoice Date
for eachFood item ID
. If there are two invoices for the sameFood item ID
with the same MAXInvoice Date
we should pick one invoice among them. This is done by grouping byInvoicesMaxDate.ItemID, InvoicesMaxDate.MaxDate
. There should be no grouping byInvoices.[Invoice ID]
here, because we want to pick the invoice with the maximum ID.Once you have this query saved as a
LatestInvoices
view, it is used further as you correctly wrote (note, that the final query usesLatestInvoices.[Invoice ID]
andLatestInvoices.ItemID
, but doesn't useLatestInvoices.MaxDate
):As for, why your last query in the question returns several rows per Item:
You are grouping here by
[Food item ID]
and[Price per unit]
, so you'll get as many rows as there are unique combinations of these two columns.The following query would return one row per
[Food item ID]
.A side note, you really should use explicit
INNER JOIN
instead of,
. That syntax is 20 years old.