Basically seems to be picking the MAX id from a partition, not max across all partitions
Writing TOP (1)
without an ORDER BY
clause to define which row is 'top' means the query processor is logically free to return any row from the set. The query plan selected by the optimizer happens to return a particular row (highest id from the first partition) but you cannot rely on this, even if it were a useful result.
Whenever you use TOP
you should always specify an ORDER BY
at the same scope to produce deterministic behaviour - unless you really do not care which row(s) come back.
Given the table size SELECT MAX(id) FROM tableA will not perform well enough
The optimizer is lacking some logic to transform a scalar MAX
or MIN
aggregate over a partitioned index to a global aggregate over per-partition aggregates. Itzik Ben-Gan explains the limitation and provides a general workaround in this article.
If the highest partition number is known and guaranteed not to change, the workaround to specify a literal partition using the $partition
function will work, though it may fail in a non-obvious way if the partitioning strategy changes in future.
This 'solution' works by eliminating all but one partition, resulting in a simple seek on one partition of the index.
Adding an order by id does not improve performance for some reason
The same optimizer limitation broadly applies to TOP (1) ... ORDER BY
. The ORDER BY
makes the result deterministic, but does not help produce a more efficient plan in this particular case (but see below).
Implied Index Keys
Your index is on id DESC, timeSampled DESC
. In SQL Server 2008 and later, partitioning introduces an extra implied leading key on $partition ASC
(it is always ascending, it is not configurable) making the full index key $partition ASC, id DESC, timeSampled DESC
.
Since id
and timeSampled
increase together (though there is nothing in the schema to guarantee this) you could rewrite the query as TOP (1) ... ORDER BY $partition DESC, id DESC
. Unfortunately, the DESC
keys on your index and ASC
implied leading key $partition
means the index could not be used to scan just one row from the index in order.
If your index keys were instead id ASC, timeSampled ASC
the whole index key would be $partition ASC, id ASC, timeSampled ASC
. This all-ASC
index could be scanned backward, returning just the first row in key order. This row would be guaranteed to have the highest id
value in the highest-numbered partition. Given the (unenforced) relationship between id and partition id, this would produce the correct result with an optimal execution plan that reads just a single row.
This 'solution' lacks integrity because the id-timeSampled relationship is not enforced, and you probably do not want to rebuild the nonclustered primary key anyway. Nevertheless, I mention it because it may enhance your understanding of how partitioning interacts with indexes.
It seems to ignore any index I put on it
Unless you're using SQL Server Enterprise Edition (or equivalently, Trial and Developer), you will need to use WITH (NOEXPAND)
on the view reference in order to use it. In fact, even if you are using Enterprise, there are good reasons to use that hint.
Without the hint, the query optimizer (in Enterprise Edition) may make a cost-based choice between using the materialized view or accessing the base tables. Where the view is as large as the base tables, this calculation may favour the base tables.
Another point of interest is that without a NOEXPAND
hint, view references are always expanded to the base query before optimization begins. As optimization progresses, the optimizer may or may not be able to match the expanded definition back to the materialized view, depending on previous optimization activity. This is almost certainly not the case with your simple query, but I mention it for completeness.
So, using the NOEXPAND
table hint is your main option, but you might also think about just materializing the base table keys and the columns needed for ordering in the view. Create a unique clustered index on the combined key columns, then a separate nonclustered index on the ordering columns.
This will reduce the size of the materialized view, and limit the number of automatic updates that must be made to keep the view synchronized with the base tables. Your query can then be written to fetch the top 1 keys in the required order from the view (ideally with NOEXPAND
), then join back to the base tables to fetch any remaining columns using the keys from the view.
Another variation is to cluster the view on the ordering columns and table keys, then write the query to manually fetch the non-view columns from the base table using the keys. The best option for you depends on the broader context. A good way to decide is to test it with the real data and workload.
Basic solution
CREATE VIEW VI_test
WITH SCHEMABINDING
AS
SELECT
t1.PK_ID1,
t1.something1,
t1.somethingelse1,
t2.PK_ID2,
t2.FK_ID1,
t2.something2,
t2.somethingelse2
FROM dbo.TB_test1 t1
INNER JOIN dbo.TB_test2 t2
ON t1.PK_ID1 = t2.FK_ID1;
GO
-- Brute force unique clustered index
CREATE UNIQUE CLUSTERED INDEX cuq
ON dbo.VI_test
(somethingelse1, somethingelse2, PK_ID1, PK_ID2);
GO
SELECT TOP (1) *
FROM dbo.VI_test WITH (NOEXPAND)
ORDER BY somethingelse1,somethingelse2;
Execution plan:
Using a nonclustered index
-- Minimal unique clustered index
CREATE UNIQUE CLUSTERED INDEX cuq
ON dbo.VI_test
(PK_ID1, PK_ID2)
WITH (DROP_EXISTING = ON);
GO
-- Nonclustered index for ordering
CREATE NONCLUSTERED INDEX ix
ON dbo.VI_test (somethingelse1, somethingelse2);
Execution plan:
There is a lookup in this plan, but it is only used to fetch a single row.
Minimal Indexed View
ALTER VIEW VI_test
WITH SCHEMABINDING
AS
SELECT
t1.PK_ID1,
t2.PK_ID2,
t1.somethingelse1,
t2.somethingelse2
FROM dbo.TB_test1 t1
INNER JOIN dbo.TB_test2 t2
ON t1.PK_ID1 = t2.FK_ID1;
GO
-- Unique clustered index
CREATE UNIQUE CLUSTERED INDEX cuq
ON dbo.VI_test
(somethingelse1, somethingelse2, PK_ID1, PK_ID2);
Query:
SELECT TOP (1)
V.PK_ID1,
TT1.something1,
V.somethingelse1,
V.PK_ID2,
TT2.FK_ID1,
TT2.something2,
V.somethingelse2
FROM dbo.VI_test AS V WITH (NOEXPAND)
JOIN dbo.TB_test1 AS TT1 ON TT1.PK_ID1 = V.PK_ID1
JOIN dbo.TB_test2 AS TT2 ON TT2.PK_ID2 = V.PK_ID2
ORDER BY somethingelse1,somethingelse2;
Execution plan:
This shows the table keys being retrieved (a single row fetch from the view clustered index in order) followed by two single-row lookups on the base tables to fetch the remaining columns.
Best Answer
Window Functions, or the OVER() Clause
It appears that what you are looking for is a window function or an
OVER()
clause.In your original example, you are trying to use two
max()
conditions, which doesn't work, because when you try to then applyGROUP BY
, you can have a condition where the max of the first columnID
and the second columnstartdate
aren't in the same row, and so thenGROUP BY
simply can't be understood.However, if you are looking to extract the max over a grouping, and you can define that grouping, and you want to obtain the 'column-wise' maximum for more than one column for that grouping (as you seem to want to do), then here's the solution.
I first made this table to recreate your input data in the picture. I then applied this query.
Try out this SQL Fiddle, and see if it gives the results you're looking for. I did the best I could with the description I had.