Sql-server – Non-Clustered Index key columns to cover a variable where clause

execution-planindexindex-tuningperformancesql server

To whoever reads this, sorry for the long question and thanks so much for your time.

I'm needing some help on understanding what columns should be key/non key in the following situation, and what potential impacts it might have on performance either way.

We're using SQL Server 2008 but must support 2005 and in our application, we have a query which is hit constantly but the where clause changes depending on how the query is hit.

It ALWAYS includes 3 columns in the where (lets call them a1, a2, a3), and varaibly includes some of the others (lets call them v1,v2,v3), and always selects all of the columns.

so what we're looking at is a query that always looks like;

select id, a1, a2, a3, v1, v2, v3, data1, data2, data3, table_order
from example_table
where a1 = ?
and a2 = ?
and a3 = ?
...
order by table_order

where sometimes (but not too often) the where clause contains one of or combinations of;

and v1 = ?
and v2 = ?
and v3 = ?

Is there a best practice for this sort of situation? From what I understand about indexing, I should be make a non-clustered index that has key columns for stuff in the where clause and everything else selected as a non key column;

CREATE NONCLUSTERED INDEX
ON example_table(a1, a2, a3, table_order) 
INCLUDE (id, v1, v2, v3, data1, data2, data3)

My Question here is should I be including v1, v2, v3 in as key columns given that sometimes they are there? IE;

CREATE NONCLUSTERED INDEX
ON example_table(a1, a2, a3, table_order, v1, v2, v3) 
INCLUDE (id, data1, data2, data3)

Would that help with performance given that this table is queried, inserted to, updated to, etc. all the time? Or should I have different indexes for each possible iteration (IE an index with a1-3, table_order & v1 as key columns for example)? IE;

CREATE NONCLUSTERED INDEX
ON example_table(a1, a2, a3, table_order, v1) 
INCLUDE (id, v2, v3, data1, data2, data3)

CREATE NONCLUSTERED INDEX
ON example_table(a1, a2, a3, table_order, v2) 
INCLUDE (id, v1, v3, data1, data2, data3)

etc.

I should mention here that DB size is of concern so I don't really want to do the multiple indexes if I can help it.

I guess what this really boils down to is that I'm wondering if the query optimizer will use the one index that has all of the key columns listed in all of these different situations?

select id, a1, a2, a3, v1, v2, v3, data1, data2, data3, table_order
from example_table
where a1 = ?
and a2 = ?
and a3 = ?
order by table_order

select id, a1, a2, a3, v1, v2, v3, data1, data2, data3, table_order
from example_table
where a1 = ?
and a2 = ?
and a3 = ?
and v1 = ?
order by table_order

select id, a1, a2, a3, v1, v2, v3, data1, data2, data3, table_order
from example_table
where a1 = ?
and a2 = ?
and a3 = ?
and v1 = ?
and v2 = ?
order by table_order

etc.

Or am I forced into having to do multiple indexes to satisfy each condition?

Given that the where clause 90% of the time only have a1, a2, a3, should I just include the v columns in as non key columns because they're selected? Does the inclusion of the extra key columns hurt performance for the 90% use case?

Could specifying so many key columns in the order I'm looking to specify cause performance concerns on simple joins like;

inner join example_table
on a1 = ?

or would the query optimizer just not use this index whatsoever for that kind of simple join? If so, I'm going to need a separate index like the follow, right?

CREATE NONCLUSTERED INDEX
ON example_table(a1) 

Best Answer

That's a great question.
And there are good answers.

The engine definitely will use the index even if you don't use every key column.
That's especially so if they are in order, as you are talking about.
(can anyone else speak to different orders of key columns?)

You will benefit just fine from selecting just on the first column alone as a key, or multiple columns.
What will make a difference - for any index - is staying inside the INCLUDEd columns.
No matter how many key columns you use in your Where, the performance hit for having to go back to the primary key for additional columns can be huge as it doubles the "operations".

When it comes to dealing with performance vs. size, you have the same problem as with any index.
Since you know you want the same columns returned in all cases, if you are READ focused, you will probably want to the index with all 6, if you INCLUDE everything.
It will certainly save you db size compared to making both indexes.

On WRITE, you obviously have a bigger burden with a larger index. That is a significant additional amount of sorting.
If you do just one row inserted at a time, maybe it won't hardly matter at all.
If you do bulk inserts, you'll definitely want to test the two indexes to see the write performance for your actual inserts.