When performing a MIN() or MAX() over a single range covered by an appropriately sorted index, SQL Server does a TOP() and so returns the value after fetching just one row. When the search criteria include more than one range, SQL Server instead grabs all the indexed values from both of the ranges and does a stream aggregate, which is far slower than performing a TOP() for each sub-value.
For example, assume a large number of orders per customer in a table like:
CREATE TABLE orders
(
customer_id int,
quantity int
)
Running this query:
SELECT MAX(quantity)
FROM orders
WHERE customer_id IN (1,2)
will result in a query that takes several times as long as if only one customer ID were specified.
What is the most efficient way to perform a query like the above? Relatedly, if separate results were needed (i.e. GROUP BY customer_id), what would the best method be?
SQL Fiddle: http://sqlfiddle.com/#!3/ef0c6/1
Best Answer
Here's a solution using
CROSS APPLY
, which does the sameTOP
query for eachcustomer_id
:This does the same work as the
UNION ALL
-based query you wrote in the Fiddle; the difference is that thecustomer_id
input is abstracted from the meat of the query, so it can easily be converted to use a table variable or table-valued parameter, which will result in a static query plan, which is important. This approach will work well for a small number ofcustomer_id
values, and simply removing the outerMAX
will return the maximum for each customer. I don't believe there's a way to further optimize this query for a small number ofcustomer_id
s using these data structures (assuming thecustomer_id
s are random, and not a range).For a large number of
customer_id
s, it probably is cheaper to do the index scan and stream aggregate than many seeks. To get this going faster, you'd have to move to some kind of denormalized data structure.MAX
isn't supported in an indexed view, so rolling your own mechanism is the only way to go, either in application logic or triggers. Depending on the read/write ratio on this table, that may or may not be faster than the above approach: you'd have to test it in your exact scenario.