MS Access is rather limited.
I assume that it is possible to have more than one invoice for the same date.
In this case I'll pick an invoice with the highest ID.
At first we'll find maximum Invoice Date for each Food Item.
SELECT
FPD1.[Food item ID] AS ItemID
,MAX(I1.[Invoice Date]) AS MaxDate
FROM
[Food purchase data] AS FPD1
INNER JOIN Invoices AS I1 ON I1.ID = FPD1.[Invoice ID]
GROUP BY
FPD1.[Food item ID]
Since it is possible that there are several invoices for the found max date we'll pick one invoice with the max ID per Item
Based on the MS Access syntax of nested joins and using this example from the docs:
SELECT fields
FROM
table1 INNER JOIN
(
table2 INNER JOIN
(
table3 INNER JOIN tablex ON table3.field3 = tablex.fieldx
) ON table2.field2 = table3.field3
) ON table1.field1 = table2.field2
;
Let's try to put it together:
SELECT
InvoicesMaxDate.ItemID
,InvoicesMaxDate.MaxDate
,MAX(I2.ID) AS MaxInvoiceID
FROM
(
SELECT
FPD1.[Food item ID] AS ItemID
,MAX(I1.[Invoice Date]) AS MaxDate
FROM
[Food purchase data] AS FPD1
INNER JOIN Invoices AS I1 ON I1.ID = FPD1.[Invoice ID]
GROUP BY
FPD1.[Food item ID]
) AS InvoicesMaxDate INNER JOIN
(
[Food purchase data] AS FPD2
INNER JOIN Invoices AS I2 ON I2.ID = FPD2.[Invoice ID]
) ON
InvoicesMaxDate.ItemID = FPD2.[Food item ID] AND
--- you may need to put extra "ON" here as well, not sure
InvoicesMaxDate.MaxDate = I2.[Invoice Date]
GROUP BY
InvoicesMaxDate.ItemID
,InvoicesMaxDate.MaxDate
Now we have both ItemID and ID of the last Invoice for that Item.
Join this to original tables to fetch other details (columns).
SELECT
FI3.Item
,FI3.Item
,FPD3.[Price per unit]
,FPD3.[Purchase unit]
,I3.[Invoice Date]
FROM
(
SELECT
InvoicesMaxDate.ItemID
,InvoicesMaxDate.MaxDate
,MAX(I2.ID) AS MaxInvoiceID
FROM
(
SELECT
FPD1.[Food item ID] AS ItemID
,MAX(I1.[Invoice Date]) AS MaxDate
FROM
[Food purchase data] AS FPD1
INNER JOIN Invoices AS I1 ON I1.ID = FPD1.[Invoice ID]
GROUP BY
FPD1.[Food item ID]
) AS InvoicesMaxDate INNER JOIN
(
[Food purchase data] AS FPD2
INNER JOIN Invoices AS I2 ON I2.ID = FPD2.[Invoice ID]
) ON
InvoicesMaxDate.ItemID = FPD2.[Food item ID] AND
InvoicesMaxDate.MaxDate = I2.[Invoice Date]
GROUP BY
InvoicesMaxDate.ItemID
,InvoicesMaxDate.MaxDate
) AS LastInvoices INNER JOIN
(
[Food items] AS FI3 INNER JOIN
(
[Food purchase data] AS FPD3
INNER JOIN Invoices AS I3 ON I3.ID = FPD3.[Invoice ID]
) ON FI3.ID = FDP3.[Food item ID]
) ON
LastInvoices.MaxInvoiceID = I3.ID AND
LastInvoices.ItemID = FI3.ID
In practice I'd create a view for the first query with a single join. Then I'd create a second view that joins the first view with the tables, then the third view and so on, to avoid the nested joins or minimize them. Overall query would be easier to read.
Edit to clarify what I mean based on your final solution that you put into the question.
One last attempt to convey my message.
This is what you wrote based on my suggestions above:
SELECT
InvoicesMaxDate.ItemID
,InvoicesMaxDate.MaxDate
,Invoices.[Invoice ID]
FROM [Food purchase data], Invoices,
(
SELECT
[Food purchase data].[Food item ID] AS ItemID
,MAX(Invoices.[Invoice Date]) AS MaxDate
FROM [Food purchase data], Invoices
WHERE Invoices.[Invoice ID] = [Food purchase data].[Invoice ID]
GROUP BY [Food purchase data].[Food item ID]
) AS InvoicesMaxDate
WHERE
Invoices.[Invoice ID] = [Food purchase data].[Invoice ID] AND
InvoicesMaxDate.ItemID = [Food purchase data].[Food item ID] AND
InvoicesMaxDate.MaxDate = Invoices.[Invoice Date]
GROUP BY InvoicesMaxDate.ItemID, InvoicesMaxDate.MaxDate, Invoices.[Invoice ID];
This is what I meant:
SELECT
InvoicesMaxDate.ItemID
,InvoicesMaxDate.MaxDate
,MAX(Invoices.[Invoice ID]) AS [Invoice ID]
FROM [Food purchase data], Invoices,
(
SELECT
[Food purchase data].[Food item ID] AS ItemID
,MAX(Invoices.[Invoice Date]) AS MaxDate
FROM [Food purchase data], Invoices
WHERE Invoices.[Invoice ID] = [Food purchase data].[Invoice ID]
GROUP BY [Food purchase data].[Food item ID]
) AS InvoicesMaxDate
WHERE
Invoices.[Invoice ID] = [Food purchase data].[Invoice ID] AND
InvoicesMaxDate.ItemID = [Food purchase data].[Food item ID] AND
InvoicesMaxDate.MaxDate = Invoices.[Invoice Date]
GROUP BY InvoicesMaxDate.ItemID, InvoicesMaxDate.MaxDate;
Do you see the difference?
The InvoicesMaxDate
returns MAX Invoice Date
for each Food item ID
.
If there are two invoices for the same Food item ID
with the same MAX Invoice Date
we should pick one invoice among them.
This is done by grouping by InvoicesMaxDate.ItemID, InvoicesMaxDate.MaxDate
.
There should be no grouping by Invoices.[Invoice ID]
here, because we want to pick the invoice with the maximum ID.
Once you have this query saved as a LatestInvoices
view, it is used further as you correctly wrote (note, that the final query uses LatestInvoices.[Invoice ID]
and LatestInvoices.ItemID
, but doesn't use LatestInvoices.MaxDate
):
SELECT
[Food items].ID as FoodItemID
,[Food items].Item as FoodItem
,[Food purchase data].[Price]
,[Food purchase data].[Price per unit]
,[Food purchase data].[Purchase unit]
,Invoices.[Invoice Date]
FROM [Food items], [Food purchase data], Invoices, LatestInvoices
WHERE
Invoices.[Invoice ID] = [Food purchase data].[Invoice ID] AND
[Food items].ID = [Food purchase data].[Food item ID] AND
LatestInvoices.[Invoice ID] = Invoices.[Invoice ID] AND
LatestInvoices.ItemID = [Food items].ID
ORDER BY [Food items].Item
As for, why your last query in the question returns several rows per Item:
SELECT
[Food purchase data].[Food item ID]
, [Food purchase data].[Price per unit]
, max(Invoices.[Invoice Date]) AS MostRecentInvoiceDate
FROM [Food purchase data], Invoices
GROUP BY [Food purchase data].[Food item ID], [Food purchase data].[Price per unit];
You are grouping here by [Food item ID]
and [Price per unit]
, so you'll get as many rows as there are unique combinations of these two columns.
The following query would return one row per [Food item ID]
.
SELECT
[Food purchase data].[Food item ID]
, max(Invoices.[Invoice Date]) AS MostRecentInvoiceDate
FROM [Food purchase data], Invoices
GROUP BY [Food purchase data].[Food item ID];
A side note, you really should use explicit INNER JOIN
instead of ,
. That syntax is 20 years old.
SELECT
[Food purchase data].[Food item ID]
, max(Invoices.[Invoice Date]) AS MostRecentInvoiceDate
FROM
[Food purchase data]
INNER JOIN Invoices ON Invoices.ID = [Food purchase data].[Invoice ID]
GROUP BY [Food purchase data].[Food item ID];
It looks like you're on the right path. Your datebit pattern idea is an interesting option, I'd like to hear if you make any more progress towards that design.
My suggestion is a pretty simple one. Why not split the ItemOfType out to their own tables and reference them with a surrogate id?. I know it isn't always a popular idea if you have good natural keys available, but in terms of space saving it could reduce your storage requirement a bit.
This was too much for a comment, please don't crucify me.
create table dbo.ItemOfType1 (
ItemOfType1Id int identity(1,1) not null
, ItemOfType1Code char(12) not null
, ReferenceType bit not null
, constraint pkc_ItemOfType1 primary key clustered (ItemOfType1Id)
, constraint uq_ItemOfType1_ItemOfType1Code_ReferenceType unique nonclustered (ItemOfType1Code, ReferenceType)
);
create table dbo.ItemOfType2 (
ItemOfType2Id int identity(1,1) not null
, ItemOfType2Code char(7) not null
, ReferenceType bit not null
, constraint pkc_ItemOfType2 primary key clustered (ItemOfType2Id)
, constraint uq_ItemOfType2_ItemOfType2Code_ReferenceType unique nonclustered (ItemOfType2Code, ReferenceType)
);
create table dbo.ItemOfType1Dates (
ItemOfType1Id int not null
, ReferenceDate date not null
, constraint pkc_ItemOfType1Dates primary key clustered (ReferenceDate,ItemOfType1Id)
, constraint fk_ItemOfType1Dates_ItemOfType1_ItemOfType1Id foreign key (ItemOfType1Id) references dbo.ItemOfType1 (ItemOfType1Id)
);
create table dbo.ItemOfType2Dates (
ItemOfType2Id int not null
, ReferenceDate date not null
, constraint pkc_ItemOfType2Dates primary key clustered (ReferenceDate,ItemOfType2Id)
, constraint fk_ItemOfType2Dates_ItemOfType2_ItemOfType2Id foreign key (ItemOfType2Id) references dbo.ItemOfType2 (ItemOfType2Id)
);
/* -- date ranges alternate
create table dbo.ItemOfType1DateRanges (
ItemOfType1Id int not null
, FromBusinessDate date not null
, ToBusinessDate date not null
, constraint pkc_ItemOfType1DateRanges primary key clustered (FromBusinessDate,ItemOfType1Id)
, constraint fk_ItemOfType1DateRanges_ItemOfType1_ItemOfType1Id foreign key (ItemOfType1Id) references dbo.ItemOfType1 (ItemOfType1Id)
);
create table dbo.ItemOfType2DateRanges (
ItemOfType2Id int not null
, FromBusinessDate date not null
, ToBusinessDate date not null
, constraint pkc_ItemOfType2DateRanges primary key clustered (FromBusinessDate,ItemOfType2Id)
, constraint fk_ItemOfType2DateRanges_ItemOfType2_ItemOfType2Id foreign key (ItemOfType2Id) references dbo.ItemOfType2 (ItemOfType2Id)
);
*/
Best Answer
Problem Categorization
I was looking for a way to use an analytical function that keeps track of inline manipulations. A single run analytical function is only able to perform so much, but not to the extend to solve this problem. The problem with nesting analytical functions is that we loose information about our dynamic pattern.
To allow dynamic inline pattern matching, in Oracle you can use MATCH_RECOGNIZE. I had no clue how to do it in Sql Server though. Then I came across a similar problem, which got resolved using a recursive CTE.
Proposed Solution
CteBase
andCteRecursive
are heavely inspired by Bogdan Sahlean's answer on this related question.