I love your approach to careful consideration to query tuning and reviewing options and plans. I wish more developers did this. One caution would be - always test with a lot of rows, looking at the logical reads, this is a smallish table. Try and generate a sample load and run the query again. One small issue - in your top query you are not asking for an order by, in your bottom query you are. You should compare and contrast them each with ordering.
I just quickly created a SalesOrders table with 200,000 sales orders in it - still not huge by any stretch of the imagination. And ran the queries with the ORDER BY in each. I also played with indexes a bit.
With no clustered index on OrderID, just a non-clustered index on CustID The second query outperformed. Especially with the order by included in each. There was twice as many reads on the first query than the second query, and the cost percentages were 67% / 33% between the queries.
With a clustered index on OrderID and a non-clustered index just on CustID They performed in a similar speed and the exact same number of reads.
So I would suggest you increase the number of rows and do some more testing. But my final analysis on your queries -
You may find them behaving more similarly than you realize when you increase the rows, so keep that caveat in mind and test that way.
If all you ever want to return is the maximum OrderID for each Customer, and you want to determine that by the OrderID being the greatest OrderID then the second query out of these two is the best way to go from my mindset - it is a bit simpler and while ever so slightly more expensive based on subtree cost it is a quicker and easier to decipher statement. If you intend on adding other columns into your result set someday? Then the first query allows you do to do that.
Updated:
One of your comments under your question was:
Please keep in mind, that finding the best query in this question is a
means of refining the techniques used for comparing them.
But best takeaway for doing that- test with more data - always makes sure you have data consistent with production and expected future production. Query plans start looking data when you give more rows to the tables, and try and keep the distribution what you'd expect in production. And pay attention to things like including Order By or not, here I don't think it makes a terrible bit of difference in the end, but still worth digging into.
Your approach of comparing this level of detail and data is a good one. Subtree costs are arbitrary and meaningless mostly, but still worth at least looking at for comparison between edits/changes or even between queries. Looking at the time statistics and the IO are quite important, as is looking at the plan for anything that feels out of place for the size of the data you are working with and what you are trying to do.
The non-clustered index you have tested is not the best for this query. It can be used for the WHERE
clause and for doing an index scan instead of a full table scan but it cannot be used for the GROUP BY
.
The best possible index would have to be a partial index (to filter the unwanted rows from the WHERE
clause), then have all the columns used in the GROUP BY
and then INCLUDE
all the other columns used in the SELECT
:
CREATE INDEX special_ix
ON dbo.Commissions_Output
( company, location, account,
salesroute, employee, producttype,
item, loadjdate, commissionrate )
INCLUDE
( [Extended Sales Price], [Delivered Qty] )
WHERE
( [Extended Sales Price] <> 0 ) ;
Best Answer
If a query has more CPU time than elapsed time that means that it contained a zone that executed in parallel. The ratio of CPU Time to elapsed time depends on many factors including the number of CPUs available to the query, what the query waited on and how long those waits were (partially depends on server workload), and how efficiently the query is able to take advantage of parallelism. For an example of the last point reference Amdahl's Law:
Many queries will have zones that run in parallel and zones that run in serial. If most of the work is done in the parallel zones and the work is evenly distributed among schedulers then you might see a high CPU time to elapsed time ratio.
Below is a bit of a contrived example of a query that can parallelize extremely efficiently in SQL Server. First note that the machine I'm testing against has 4 cores and all of them are available to the query. From Task Manager:
From SQL Server:
Now I'll put 100 rows into one temp table and 100k rows into the other:
Here is the query that I'm going to test with:
It's a bit of a mess so let's walk through it. Based on the data in the tables that query will return all 100 rows from
#OUTER_TABLE
along with any value from#INNER_TABLE
. However, I have not made this information available to the query optimizer so it will do a lot of work to get the result that we're expecting. Overall the query is silly but what's important is how the query executes. Here's a screenshot:I'm going to skip over why I wrote the query that way because it isn't relevant to the question. However, one thing to note is that the 100 rows from
#OUTER_TABLE
are split into four threads with 25 rows per thread. Each thread does a full scan of#INNER_TABLE
per row, so 4 threads do 25 scans each. The part of the query on the inner part of the loop does almost all of the work as you might expect (the outer part scans 100 rows but the inner part scans 100 * 100000 = 10000000 rows).When I run the query with
MAXDOP 4
(using all four cores) I get the following results fromSET STATISTICS TIME ON
:Running with MAXDOP 2 (using just two cores) gives the following results:
Running with MAXDOP 1 (not in parallel) gives the following results:
As you can see, the query can very efficiently take advantage of parallelism. Most queries in the wild will not have such a relatively high CPU time to elapsed time ratio (after factoring in DOP).
For an example of when the elapsed time can be greater than CPU time consider the time that it takes to send the results to a client, such as to a SQL Server Management Studio results grid. The time to send those results will be included in the elapsed time but won't factor that heavily into CPU time. The work that the server does to read the results from memory or disk will contribute to both CPU time and elapsed time. The time spent sending results to the client is one type of wait event (usually
ASYNC_NETWORK_IO
). There are many different reasons why a SQL Server query could be waiting and not all of them affect CPU utilization in the same way.Suppose I run this query:
Getting the results to the client results in a CPU time of 94 ms and an elapsed time of 749 ms. I can use sys.dm_exec_session_wait_stats to see the wait events for the session:
I can also run the query in a different session without returning the result set to the client. In one test this took 47 ms of CPU time and 60 ms of elapsed time. Here are the wait sessions from that query which are quite different from the other one: