Sql-server – Change of index key columns order affects performance

database-tuning-advisornonclustered-indexprofilersql server

I have the following table in my database:

create table [dbo].[tb_StatusLog](
[RequestID] [int] not null,
[statusID] [int] not null,
[startTime] [datetime] not null,
[endTime] [datetime] null,
)

with the following non-clustered index

create nonclusteredindex [index1] on [dbo].[tb_StatusLog]
(
requestID asc, 
statusID asc
)
include ([startTime])

and the following table-valued function fn_getTable

select requestID, max(startTime)
from tb_StatusLog
where statusID=2
group by requestID, statusID

after running profiler trace and running the results by db engine tuning advisor, it proposed that I create the following index (which is the same index I have but with the key columns reversed):

create nonclusteredindex [index2] on [dbo].[tb_StatusLog]
(
statusID asc, 
requestID asc
)
include ([startTime])

Now, when running fn_getTable, the execution plan uses index2 instead of index1 and the performance improved.

Why is that?

Best Answer

Because you have an equality predicate on where statusID=2.

With the reversed order it is able to seek into exactly the rows matching the status and these can then feed into a stream aggregate to do the grouping.

Your original index supports the group by requestID, statusID but not the WHERE - meaning the whole index will need to be scanned and only rows matching the statusID=2 retained.

Related Solutions

Sql-server – SQL Server 2008 datetime index performance bug

There's no mystery, you get a good(er) or (really) bad plan at basically random because there is no clear cut choice for the index to use. While compelling for the ORDER BY clause and thus avoid the sort, you non-clustered index on the datetime column is a very poor choice for this query. What would make a much better index for this query would be one on (serial_number, test_date). Even better, this would make a very good candidate for a clustered index key.

As a rule of thumb time series should be clustered by the time column, because the overwhelming majority of requests are interested in specific time ranges. If the data is also inherently partitioned on a column with low selectivity, like it seems to be the case with your serial_number, then this column should be added as the leftmost one in the clustered key definition.

Sql-server – One index or two

And so enters the art of performance tuning and indexing strategies...

It seems logical to me to amend the existing index definition to include the suggested columns

I'm going to take your quote and write a third index definition:

create index [idx_index3]
on [table1] (col1, col2, col3)
include (col4, col5, col6....);

That should be the CREATE INDEX statement that corresponds to your quoted statement.

That very well may be a prudent solution, but it depends. Here are a couple of examples when I say that it depends.

If you have a common workload that mostly consists of queries like this:

select col1, col2, col3
from table1
where col1 = 1
and col2 = 2
and col3 = 3;

Then your idx_index1 index would be solid. Perfectly narrow, it's an index that satisfies that query with no extraneous data in it (not taking into account the clustered index definition, if one at all).

But if you have workload that consists of queries mainly like the following:

select co11, col2, col3, col4, col5
from table1
where col1 = 1
and col2 = 2;

Then idx_index2 would be wise, as it is what's called a covering index preventing the need for a key lookup back to the clustered index (or a RID lookup back to the heap). That nonclustered index definition would solely encompass all of the data that query needs.

With your recommendation, it would be well suited for a query like the following:

select co11, col2, col3, col4, col5
from table1
where col1 = 1
and col2 = 2
and col3 = 3;

Your idx_index3 recommendation would be a covering index that satisfies the search criteria for the above query.

The point I'm trying to get at, is in an isolated question like this we can't answer this definitively. It all depends on what the common and frequent workload is. Of course you could always define all three of these indexes to handle each sample query type, but then comes into question the maintenance that'll be required to keep these indexes updated (think: INSERTs, UPDATEs, DELETEs). That's the overhead of indexes.

You need to dissect and evaluate the workload, and determine where the advantages will be best in place. If the first sample query is the most common by far being executed dozens of times a second, and there is a very infrequent query like the third sample query, then it wouldn't make sense to bloat the leaf level pages of the index with the INCLUDE nonkey columns. It all depends on your workload.

If you understand prudent indexing strategies, and you understand your common workload, then by applying both of those you will be able to come up with what is the best route to take.

Best Answer

Related Solutions

Sql-server – SQL Server 2008 datetime index performance bug

Sql-server – One index or two

Related Question