There's no mystery, you get a good(er) or (really) bad plan at basically random because there is no clear cut choice for the index to use. While compelling for the ORDER BY clause and thus avoid the sort, you non-clustered index on the datetime column is a very poor choice for this query. What would make a much better index for this query would be one on (serial_number, test_date)
. Even better, this would make a very good candidate for a clustered index key.
As a rule of thumb time series should be clustered by the time column, because the overwhelming majority of requests are interested in specific time ranges. If the data is also inherently partitioned on a column with low selectivity, like it seems to be the case with your serial_number, then this column should be added as the leftmost one in the clustered key definition.
And so enters the art of performance tuning and indexing strategies...
It seems logical to me to amend the existing index definition to include the suggested columns
I'm going to take your quote and write a third index definition:
create index [idx_index3]
on [table1] (col1, col2, col3)
include (col4, col5, col6....);
That should be the CREATE INDEX
statement that corresponds to your quoted statement.
That very well may be a prudent solution, but it depends. Here are a couple of examples when I say that it depends.
If you have a common workload that mostly consists of queries like this:
select col1, col2, col3
from table1
where col1 = 1
and col2 = 2
and col3 = 3;
Then your idx_index1
index would be solid. Perfectly narrow, it's an index that satisfies that query with no extraneous data in it (not taking into account the clustered index definition, if one at all).
But if you have workload that consists of queries mainly like the following:
select co11, col2, col3, col4, col5
from table1
where col1 = 1
and col2 = 2;
Then idx_index2
would be wise, as it is what's called a covering index preventing the need for a key lookup back to the clustered index (or a RID lookup back to the heap). That nonclustered index definition would solely encompass all of the data that query needs.
With your recommendation, it would be well suited for a query like the following:
select co11, col2, col3, col4, col5
from table1
where col1 = 1
and col2 = 2
and col3 = 3;
Your idx_index3
recommendation would be a covering index that satisfies the search criteria for the above query.
The point I'm trying to get at, is in an isolated question like this we can't answer this definitively. It all depends on what the common and frequent workload is. Of course you could always define all three of these indexes to handle each sample query type, but then comes into question the maintenance that'll be required to keep these indexes updated (think: INSERTs, UPDATEs, DELETEs). That's the overhead of indexes.
You need to dissect and evaluate the workload, and determine where the advantages will be best in place. If the first sample query is the most common by far being executed dozens of times a second, and there is a very infrequent query like the third sample query, then it wouldn't make sense to bloat the leaf level pages of the index with the INCLUDE
nonkey columns. It all depends on your workload.
If you understand prudent indexing strategies, and you understand your common workload, then by applying both of those you will be able to come up with what is the best route to take.
Best Answer
Because you have an equality predicate on
where statusID=2
.With the reversed order it is able to seek into exactly the rows matching the status and these can then feed into a stream aggregate to do the grouping.
Your original index supports the
group by requestID, statusID
but not theWHERE
- meaning the whole index will need to be scanned and only rows matching thestatusID=2
retained.