Basically seems to be picking the MAX id from a partition, not max across all partitions
Writing TOP (1)
without an ORDER BY
clause to define which row is 'top' means the query processor is logically free to return any row from the set. The query plan selected by the optimizer happens to return a particular row (highest id from the first partition) but you cannot rely on this, even if it were a useful result.
Whenever you use TOP
you should always specify an ORDER BY
at the same scope to produce deterministic behaviour - unless you really do not care which row(s) come back.
Given the table size SELECT MAX(id) FROM tableA will not perform well enough
The optimizer is lacking some logic to transform a scalar MAX
or MIN
aggregate over a partitioned index to a global aggregate over per-partition aggregates. Itzik Ben-Gan explains the limitation and provides a general workaround in this article.
If the highest partition number is known and guaranteed not to change, the workaround to specify a literal partition using the $partition
function will work, though it may fail in a non-obvious way if the partitioning strategy changes in future.
This 'solution' works by eliminating all but one partition, resulting in a simple seek on one partition of the index.
Adding an order by id does not improve performance for some reason
The same optimizer limitation broadly applies to TOP (1) ... ORDER BY
. The ORDER BY
makes the result deterministic, but does not help produce a more efficient plan in this particular case (but see below).
Implied Index Keys
Your index is on id DESC, timeSampled DESC
. In SQL Server 2008 and later, partitioning introduces an extra implied leading key on $partition ASC
(it is always ascending, it is not configurable) making the full index key $partition ASC, id DESC, timeSampled DESC
.
Since id
and timeSampled
increase together (though there is nothing in the schema to guarantee this) you could rewrite the query as TOP (1) ... ORDER BY $partition DESC, id DESC
. Unfortunately, the DESC
keys on your index and ASC
implied leading key $partition
means the index could not be used to scan just one row from the index in order.
If your index keys were instead id ASC, timeSampled ASC
the whole index key would be $partition ASC, id ASC, timeSampled ASC
. This all-ASC
index could be scanned backward, returning just the first row in key order. This row would be guaranteed to have the highest id
value in the highest-numbered partition. Given the (unenforced) relationship between id and partition id, this would produce the correct result with an optimal execution plan that reads just a single row.
This 'solution' lacks integrity because the id-timeSampled relationship is not enforced, and you probably do not want to rebuild the nonclustered primary key anyway. Nevertheless, I mention it because it may enhance your understanding of how partitioning interacts with indexes.
I initially thought you were on to something here. Working assumption was along the lines that perhaps the buffer pool wasn't immediately flushed as it requires "some work" to do so and why bother until the memory was required. But...
Your test is flawed.
What you're seeing in the buffer pool is the pages read as a result of re-attaching the database, not the remains of the previous instance of the database.
And we can see that the buffer pool was not totally blown away by the
detach/attach. Seems like my buddy was wrong. Does anyone disagree or
have a better argument?
Yes. You're interpreting physical reads 0
as meaning there were not any physical reads
Table 'DatabaseLog'. Scan count 1, logical reads 782, physical reads
0, read-ahead reads 768, lob logical reads 94, lob physical reads 4,
lob read-ahead reads 24.
As described on Craig Freedman's blog the sequential read ahead mechanism tries to ensure that pages are in memory before they're requested by the query processor, which is why you see zero or a lower than expected physical read count reported.
When SQL Server performs a sequential scan of a large table, the
storage engine initiates the read ahead mechanism to ensure that pages
are in memory and ready to scan before they are needed by the query
processor. The read ahead mechanism tries to stay 500 pages ahead of
the scan.
None of the pages required to satisfy your query were in memory until read-ahead put them there.
As to why online/offline results in a different buffer pool profile warrants a little more idle investigation. @MarkSRasmussen might be able to help us out with that next time he visits.
Best Answer
Try using
sys.dm_db_partition_stats
instead ofsys.partitions
, as in:For Azure SQL Data Warehouse, you'll need to use
sys.dm_pdw_nodes_db_partition_stats
instead ofsys.dm_db_partition_stats
, even though they contain the same details.Note, I removed the
CONVERT(date,...)
functionality so this code is compatible with all partition schemes, not just those with date range values.In the on-prem version of SQL Server,
sys.partitions
gets its row counts from the internal tableALUCOUNT
orsys.sysrowsets
, ifALUCOUNT.rows
is NULL. The definition ofsys.partitions
is:The on-prem version of
sys.dm_db_partition_stats
gets its row counts differently, from the internal table,PARTITIONCOUNTS
:While both
sys.partitions
andsys.dm_db_partition_stats
should both have correct row counts, I'd put more trust in thePARTITIONCOUNTS
internal table.