Ms-access – SQL queries to only show most recent purchase records for individual food items

datedistinctjoin;ms accesssorting

I'm working with a food purchasing / invoice system in MS Access 2013 and am trying to create an SQL query that will return the most recent purchase price for each individual food item.

Here is a diagram of the tables I'm working with:
Tables in MS Access database

My understanding of SQL is very basic, and I tried the following (incorrect) query, in the hopes that it would return only one record per item (because of the DISTINCT operator) and that it would only return the most recent purchase (since I did ORDER BY [Invoice Date] DESC)

SELECT DISTINCT ([Food items].Item), 
    [Food items].Item, [Food purchase data].[Price per unit], [Food purchase data].[Purchase unit], Invoices.[Invoice Date]
FROM Invoices
INNER JOIN ([Food items] 
    INNER JOIN [Food purchase data] 
    ON [Food items].ID = [Food purchase data].[Food item ID]) 
ON Invoices.ID = [Food purchase data].[Invoice ID]
ORDER BY Invoices.[Invoice Date] DESC;

However the query above simply returns all of the food purchases (i.e. multiple records for each record in [Food items]), with the results sorted descending by date. Can someone explain to me what I am misunderstanding about the DISTINCT operator? That is, why is it not returning only one record for each item in [Food items]?

And more to the point – what is the simplest way for me to just pull the most recent food purchase data for each individual food item, given the table structure shown above? I don't really care about efficiency as much as simplicity (the database I'm working with is rather small – it will be years before it's even in the tens of thousands of records range). I care more about the query being understandable for someone with little knowledge of SQL.

UPDATE:
So I tried, both of the answers suggested below, and neither of them work (they just throw up syntax errors).

Based on the suggestions below, and further reading online, I wrote the following new query, using the aggregate function max() and a GROUP BY clause:

SELECT [Food purchase data].[Food item ID], [Food purchase data].[Price per unit], max(Invoices.[Invoice Date]) AS MostRecentInvoiceDate
FROM [Food purchase data], Invoices
GROUP BY [Food purchase data].[Food item ID], [Food purchase data].[Price per unit];

But I am still having the same problem: that is, I'm still seeing more than one result for each food item. Can anyone explain why this query is not only returning the most recent purchase for each food item?

UPDATE 2 (SOLVED!):

None of the answers below quite worked out but based on some heavy modification of Vladimir's answer below, I was able to create the following queries, which appear to be giving the correct results.

First, I created this view and named it "LatestInvoices":

SELECT InvoicesMaxDate.ItemID, InvoicesMaxDate.MaxDate, InvoicesMaxDate.MaxID
FROM [Food purchase data], Invoices, (SELECT [Food purchase data].[Food item ID] AS ItemID, MAX(Invoices.[Invoice Date]) AS MaxDate, MAX(Invoices.[Invoice ID]) AS MaxID
                FROM [Food purchase data], Invoices
                WHERE Invoices.[Invoice ID] = [Food purchase data].[Invoice ID]
                GROUP BY [Food purchase data].[Food item ID]
         )  AS InvoicesMaxDate
WHERE InvoicesMaxDate.MaxID = [Food purchase data].[Invoice ID] AND
                      InvoicesMaxDate.ItemID = [Food purchase data].[Food item ID] AND 
                      InvoicesMaxDate.MaxDate = Invoices.[Invoice Date]
GROUP BY InvoicesMaxDate.ItemID, InvoicesMaxDate.MaxDate,  InvoicesMaxDate.MaxID

Then I wrote another query to pull in the fields I needed:

SELECT [Food items].ID AS FoodItemID, [Food items].Item AS FoodItem, [Food purchase data].[Price], [Food purchase data].[Price per unit], [Food purchase data].[Purchase unit], LatestInvoices.MaxDate as InvoiceDate
FROM [Food items], [Food purchase data], LatestInvoices
WHERE LatestInvoices.[MaxID] = [Food purchase data].[Invoice ID] AND
             LatestInvoices.ItemID = [Food purchase data].[Food item ID] AND
             LatestInvoices.ItemID = [Food items].ID
ORDER BY [Food items].Item;

Thanks to all of you who took the time to help me with this!

Best Answer

MS Access is rather limited.

I assume that it is possible to have more than one invoice for the same date. In this case I'll pick an invoice with the highest ID.

At first we'll find maximum Invoice Date for each Food Item.

SELECT
    FPD1.[Food item ID] AS ItemID
    ,MAX(I1.[Invoice Date]) AS MaxDate
FROM
    [Food purchase data] AS FPD1
    INNER JOIN Invoices AS I1 ON I1.ID = FPD1.[Invoice ID]
GROUP BY
    FPD1.[Food item ID]

Since it is possible that there are several invoices for the found max date we'll pick one invoice with the max ID per Item

Based on the MS Access syntax of nested joins and using this example from the docs:

SELECT fields 
FROM 
  table1 INNER JOIN 
  (
      table2 INNER JOIN 
      (
          table3 INNER JOIN tablex ON table3.field3 = tablex.fieldx
      ) ON table2.field2 = table3.field3
  ) ON table1.field1 = table2.field2
;

Let's try to put it together:

SELECT
    InvoicesMaxDate.ItemID
    ,InvoicesMaxDate.MaxDate
    ,MAX(I2.ID) AS MaxInvoiceID
FROM
    (
        SELECT
            FPD1.[Food item ID] AS ItemID
            ,MAX(I1.[Invoice Date]) AS MaxDate
        FROM
            [Food purchase data] AS FPD1
            INNER JOIN Invoices AS I1 ON I1.ID = FPD1.[Invoice ID]
        GROUP BY
            FPD1.[Food item ID]
    ) AS InvoicesMaxDate INNER JOIN
    (
        [Food purchase data] AS FPD2 
        INNER JOIN Invoices AS I2 ON I2.ID = FPD2.[Invoice ID]
    ) ON
        InvoicesMaxDate.ItemID = FPD2.[Food item ID] AND
        --- you may need to put extra "ON" here as well, not sure
        InvoicesMaxDate.MaxDate = I2.[Invoice Date]
GROUP BY
    InvoicesMaxDate.ItemID
    ,InvoicesMaxDate.MaxDate

Now we have both ItemID and ID of the last Invoice for that Item. Join this to original tables to fetch other details (columns).

SELECT
    FI3.Item
    ,FI3.Item
    ,FPD3.[Price per unit]
    ,FPD3.[Purchase unit]
    ,I3.[Invoice Date]
FROM
    (
        SELECT
            InvoicesMaxDate.ItemID
            ,InvoicesMaxDate.MaxDate
            ,MAX(I2.ID) AS MaxInvoiceID
        FROM
            (
                SELECT
                    FPD1.[Food item ID] AS ItemID
                    ,MAX(I1.[Invoice Date]) AS MaxDate
                FROM
                    [Food purchase data] AS FPD1
                    INNER JOIN Invoices AS I1 ON I1.ID = FPD1.[Invoice ID]
                GROUP BY
                    FPD1.[Food item ID]
            ) AS InvoicesMaxDate INNER JOIN
            (
                [Food purchase data] AS FPD2 
                INNER JOIN Invoices AS I2 ON I2.ID = FPD2.[Invoice ID]
            ) ON
                InvoicesMaxDate.ItemID = FPD2.[Food item ID] AND
                InvoicesMaxDate.MaxDate = I2.[Invoice Date]
        GROUP BY
            InvoicesMaxDate.ItemID
            ,InvoicesMaxDate.MaxDate
    ) AS LastInvoices INNER JOIN
    (
        [Food items] AS FI3 INNER JOIN
        (
            [Food purchase data] AS FPD3
            INNER JOIN Invoices AS I3 ON I3.ID = FPD3.[Invoice ID]
        ) ON FI3.ID = FDP3.[Food item ID]
    ) ON
        LastInvoices.MaxInvoiceID = I3.ID AND
        LastInvoices.ItemID = FI3.ID

In practice I'd create a view for the first query with a single join. Then I'd create a second view that joins the first view with the tables, then the third view and so on, to avoid the nested joins or minimize them. Overall query would be easier to read.


Edit to clarify what I mean based on your final solution that you put into the question.

One last attempt to convey my message.

This is what you wrote based on my suggestions above:

SELECT
    InvoicesMaxDate.ItemID
    ,InvoicesMaxDate.MaxDate
    ,Invoices.[Invoice ID]
FROM [Food purchase data], Invoices, 
    (
        SELECT 
            [Food purchase data].[Food item ID] AS ItemID
            ,MAX(Invoices.[Invoice Date]) AS MaxDate
        FROM [Food purchase data], Invoices
        WHERE Invoices.[Invoice ID] = [Food purchase data].[Invoice ID]
        GROUP BY [Food purchase data].[Food item ID]
    )  AS InvoicesMaxDate
WHERE
    Invoices.[Invoice ID] = [Food purchase data].[Invoice ID] AND
    InvoicesMaxDate.ItemID = [Food purchase data].[Food item ID] AND 
    InvoicesMaxDate.MaxDate = Invoices.[Invoice Date]
GROUP BY InvoicesMaxDate.ItemID, InvoicesMaxDate.MaxDate, Invoices.[Invoice ID];

This is what I meant:

SELECT
    InvoicesMaxDate.ItemID
    ,InvoicesMaxDate.MaxDate
    ,MAX(Invoices.[Invoice ID]) AS [Invoice ID]
FROM [Food purchase data], Invoices, 
    (
        SELECT
            [Food purchase data].[Food item ID] AS ItemID
            ,MAX(Invoices.[Invoice Date]) AS MaxDate
        FROM [Food purchase data], Invoices
        WHERE Invoices.[Invoice ID] = [Food purchase data].[Invoice ID]
        GROUP BY [Food purchase data].[Food item ID]
    )  AS InvoicesMaxDate
WHERE
    Invoices.[Invoice ID] = [Food purchase data].[Invoice ID] AND
    InvoicesMaxDate.ItemID = [Food purchase data].[Food item ID] AND 
    InvoicesMaxDate.MaxDate = Invoices.[Invoice Date]
GROUP BY InvoicesMaxDate.ItemID, InvoicesMaxDate.MaxDate;

Do you see the difference?

The InvoicesMaxDate returns MAX Invoice Date for each Food item ID. If there are two invoices for the same Food item ID with the same MAX Invoice Date we should pick one invoice among them. This is done by grouping by InvoicesMaxDate.ItemID, InvoicesMaxDate.MaxDate. There should be no grouping by Invoices.[Invoice ID] here, because we want to pick the invoice with the maximum ID.

Once you have this query saved as a LatestInvoices view, it is used further as you correctly wrote (note, that the final query uses LatestInvoices.[Invoice ID] and LatestInvoices.ItemID, but doesn't use LatestInvoices.MaxDate):

SELECT 
    [Food items].ID as FoodItemID
    ,[Food items].Item as FoodItem
    ,[Food purchase data].[Price]
    ,[Food purchase data].[Price per unit]
    ,[Food purchase data].[Purchase unit]
    ,Invoices.[Invoice Date]
FROM [Food items], [Food purchase data], Invoices, LatestInvoices
WHERE 
    Invoices.[Invoice ID] = [Food purchase data].[Invoice ID] AND
    [Food items].ID = [Food purchase data].[Food item ID] AND
    LatestInvoices.[Invoice ID] = Invoices.[Invoice ID] AND 
    LatestInvoices.ItemID = [Food items].ID
ORDER BY [Food items].Item

As for, why your last query in the question returns several rows per Item:

SELECT 
    [Food purchase data].[Food item ID]
    , [Food purchase data].[Price per unit]
    , max(Invoices.[Invoice Date]) AS MostRecentInvoiceDate
FROM [Food purchase data], Invoices
GROUP BY [Food purchase data].[Food item ID], [Food purchase data].[Price per unit];

You are grouping here by [Food item ID] and [Price per unit], so you'll get as many rows as there are unique combinations of these two columns.

The following query would return one row per [Food item ID].

SELECT 
    [Food purchase data].[Food item ID]
    , max(Invoices.[Invoice Date]) AS MostRecentInvoiceDate
FROM [Food purchase data], Invoices
GROUP BY [Food purchase data].[Food item ID];

A side note, you really should use explicit INNER JOIN instead of ,. That syntax is 20 years old.

SELECT 
    [Food purchase data].[Food item ID]
    , max(Invoices.[Invoice Date]) AS MostRecentInvoiceDate
FROM
    [Food purchase data]
    INNER JOIN Invoices ON Invoices.ID = [Food purchase data].[Invoice ID]
GROUP BY [Food purchase data].[Food item ID];