An example from this question shows that SQL Server will choose a full index scan to solve a query like this:
select distinct [typeName] from [types]
Where [typeName] has a non-clustered, non-unique ascending index on it. His example has 200M rows but only 76 unique values. It seems like a seek plan would be a better choice with that density (~76 multiple binary searchs)?
His case could be normalized but the reason for the question is that I really want to solve something like this:
select TransactionId, max(CreatedUtc)
from TxLog
group by TransactionId
There is an index on (TransactionId, MaxCreatedUtc)
.
Re-writing using a normalized source (dt) does not not change the plan.
select dt.TransactionId, MaxCreatedUtc
from [Transaction] dt -- distinct transactions
cross apply
(
select Max(CreatedUtc) as MaxCreatedUtc
from TxLog tl
where tl.TransactionId = dt.TransactionId
) ca
Running just the CA subquery as a Scalar UDF does show a plan of 1 seek.
select max(CreatedUtc) as MaxCreatedUtc
from Pub.TransactionLog
where TransactionID = @TxId;
Using that Scalar UDF in the original query seems to work but loses parallelism (known issue with UDFs):
select t.typeName,
Pub.ufn_TransactionMaxCreatedUtc(t.TransactionId) as MaxCreatedUtc
from Pub.[Transaction] t
Rewriting using an Inline TVF reverts it back to the scan based plan.
From answer/comment @ypercube:
select TransactionId, MaxCreatedUtc
from Pub.[Transaction] t
cross apply
(
select top (1) CreatedUtc as MaxCreatedUtc
from Pub.TransactionLog l
where l.TransactionID = t.TransactionId
order by CreatedUtc desc
) ca
Plan looks good. No parallelism but pointless since so fast. Will have to try this on a larger problem sometime. Thanks.
Best Answer
I have exactly the same set up and I've been through the same stages of rewriting the query.
In my case the table names and meaning is a bit different, but overall structure is the same. Your table
Transactions
corresponds to my tablePortalElevators
below. It has ~2000 rows. Your tableTxLog
corresponds to my tablePlaybackStats
. It has ~150M rows. It has index on(ElevatorID, DataSourceRowID)
, same as you.I'll run several variants of the query on the real data and compare execution plans, IO and time stats. I'm using SQL Server 2008 Standard.
GROUP BY with MAX
Same as for you optimizer scans the index and aggregates results. Slow.
Individual row
Let's see what optimizer would do if I requested
MAX
just for one row:Optimizer is smart enough to use the index and it does one seek. By the way, we can see that optimizer uses
TOP
operator, even though the query doesn't have it. This is a telling sign that optimization paths ofMAX
andTOP
have something in common in the engine, but they are different as we'll see below.CROSS APPLY with MAX
Optimizer still scans the index. It is not smart enough to convert
MAX
intoTOP
and scan into seeks here. Slow. I didn't think of this variant originally, my next try was scalar UDF.Scalar UDF
I saw that plan for getting
MAX
for individual row had index seek, so I put that simple query in a scalar UDF.It does run fast. At least, much faster than
Group by
. Unfortunately, execution plan doesn't show details of UDF and what is even worse, it doesn't show the real IO stats (it doesn't include IO generated by UDF). You need to run Profiler to see all calls of the function and their stats. This plan shows only 6 reads. The plan for individual row has 4 reads, so real number would be close to:6 + 2779 * 4 = 6 + 11,116 = 11,122
.CROSS APPLY with TOP
Eventually, I discovered the
CROSS APPLY
and how it can be applied ;-) in this case.Here optimizer is smart enough to do ~2000 seeks. You can see that number of reads is much lower than for
group by
. Fast.Interestingly, number of reads here (11,850) is a bit more than reads that I estimated with UDF (11,122). Table IO stats with
CROSS APPLY
have 11,844 reads and 2,779 scan count of the large table, which gives11,844 / 2,779 ~= 4.26
reads per index seek. Most likely, seeks for some values use 4 reads and for some 5, with average 4.26. There are 2,779 seeks, but there are values only for 2,130 rows. As I said, it is difficult to get real number of reads with UDF without profiler.Recursive CTE
As was pointed in comments, Paul White described a Recursive Index Skip Scan method to find distinct values in a large table without performing a full index scan, but doing index seeks recursively. To start recursion we need to find the
MIN
orMAX
value for an anchor and then each step of recursion adds next value one by one. The post explains it in details.It is pretty fast, though it performs almost twice amount of reads as
CROSS APPLY
. It does 12,781 reads inWorktable
and 8,524 inPlaybackStats
. On the other hand, it performs as many seeks as there are distinct values in the large table.CROSS APPLY
withTOP
performs as many seeks as there are rows in the small table. In my case small table has 2,779 rows, but large table has only 2,130 distinct values.Summary
I ran each query three times and picked the best time. There were no physical reads.
Conclusion
In this special case of
greatest-n-per-group
problem we have:n=1
;Two best methods are:
In case when we have a small table with the list of groups, the best method is
CROSS APPLY
withTOP
.In case when we have only large table, the best method is
Recursive Index Skip Scan
.