You really don't need partitioning for this to work efficiently, which is what I told you on your last question as well...even if you have billions of rows.
If you cluster on the BarcodeID
(which I am assuming is unique) and put a nonclustered index on LoyaltyCardID
it should work just fine. These are NOT complicatd queries with a lot of additional logic from the sounds of things, and simple seek operations are extremely efficient on their own.
Are you getting pressure to partition or is it just something that you have decided to do?
Basically seems to be picking the MAX id from a partition, not max across all partitions
Writing TOP (1)
without an ORDER BY
clause to define which row is 'top' means the query processor is logically free to return any row from the set. The query plan selected by the optimizer happens to return a particular row (highest id from the first partition) but you cannot rely on this, even if it were a useful result.
Whenever you use TOP
you should always specify an ORDER BY
at the same scope to produce deterministic behaviour - unless you really do not care which row(s) come back.
Given the table size SELECT MAX(id) FROM tableA will not perform well enough
The optimizer is lacking some logic to transform a scalar MAX
or MIN
aggregate over a partitioned index to a global aggregate over per-partition aggregates. Itzik Ben-Gan explains the limitation and provides a general workaround in this article.
If the highest partition number is known and guaranteed not to change, the workaround to specify a literal partition using the $partition
function will work, though it may fail in a non-obvious way if the partitioning strategy changes in future.
This 'solution' works by eliminating all but one partition, resulting in a simple seek on one partition of the index.
Adding an order by id does not improve performance for some reason
The same optimizer limitation broadly applies to TOP (1) ... ORDER BY
. The ORDER BY
makes the result deterministic, but does not help produce a more efficient plan in this particular case (but see below).
Implied Index Keys
Your index is on id DESC, timeSampled DESC
. In SQL Server 2008 and later, partitioning introduces an extra implied leading key on $partition ASC
(it is always ascending, it is not configurable) making the full index key $partition ASC, id DESC, timeSampled DESC
.
Since id
and timeSampled
increase together (though there is nothing in the schema to guarantee this) you could rewrite the query as TOP (1) ... ORDER BY $partition DESC, id DESC
. Unfortunately, the DESC
keys on your index and ASC
implied leading key $partition
means the index could not be used to scan just one row from the index in order.
If your index keys were instead id ASC, timeSampled ASC
the whole index key would be $partition ASC, id ASC, timeSampled ASC
. This all-ASC
index could be scanned backward, returning just the first row in key order. This row would be guaranteed to have the highest id
value in the highest-numbered partition. Given the (unenforced) relationship between id and partition id, this would produce the correct result with an optimal execution plan that reads just a single row.
This 'solution' lacks integrity because the id-timeSampled relationship is not enforced, and you probably do not want to rebuild the nonclustered primary key anyway. Nevertheless, I mention it because it may enhance your understanding of how partitioning interacts with indexes.
Best Answer
With partitioning you would have to do a partition per day, which puts the Pre-SQL 2012 limit of 1000 paritions in a new perspective as it would only allow for 3 years archive. With SQL Server 2012 you get 15000 partitions which is plenty for 1 partition per day.
Every day you would add a new partition. If you want to move the 61st past day partition you can do it efficiently, but is still an offline operation. See Move a Partition to a Different File Group Efficiently.
All your indexes would have to be aligned, see Special Guidelines for Partitioned Indexes.
Buying into partitioning is not an easy decission and it may be quite a big bite to chew... see How To Decide if You Should Use Table Partitioning. Specifically you should not expect performance improvements from partitioning. You should approach performance problems on time seriest by clustering by datetime.