Sql-server – Non-Clustered Index key columns to cover a variable where clause

execution-planindexindex-tuningperformancesql server

To whoever reads this, sorry for the long question and thanks so much for your time.

I'm needing some help on understanding what columns should be key/non key in the following situation, and what potential impacts it might have on performance either way.

We're using SQL Server 2008 but must support 2005 and in our application, we have a query which is hit constantly but the where clause changes depending on how the query is hit.

It ALWAYS includes 3 columns in the where (lets call them a1, a2, a3), and varaibly includes some of the others (lets call them v1,v2,v3), and always selects all of the columns.

so what we're looking at is a query that always looks like;

select id, a1, a2, a3, v1, v2, v3, data1, data2, data3, table_order
from example_table
where a1 = ?
and a2 = ?
and a3 = ?
...
order by table_order

where sometimes (but not too often) the where clause contains one of or combinations of;

and v1 = ?
and v2 = ?
and v3 = ?

Is there a best practice for this sort of situation? From what I understand about indexing, I should be make a non-clustered index that has key columns for stuff in the where clause and everything else selected as a non key column;

CREATE NONCLUSTERED INDEX
ON example_table(a1, a2, a3, table_order) 
INCLUDE (id, v1, v2, v3, data1, data2, data3)

My Question here is should I be including v1, v2, v3 in as key columns given that sometimes they are there? IE;

CREATE NONCLUSTERED INDEX
ON example_table(a1, a2, a3, table_order, v1, v2, v3) 
INCLUDE (id, data1, data2, data3)

Would that help with performance given that this table is queried, inserted to, updated to, etc. all the time? Or should I have different indexes for each possible iteration (IE an index with a1-3, table_order & v1 as key columns for example)? IE;

CREATE NONCLUSTERED INDEX
ON example_table(a1, a2, a3, table_order, v1) 
INCLUDE (id, v2, v3, data1, data2, data3)

CREATE NONCLUSTERED INDEX
ON example_table(a1, a2, a3, table_order, v2) 
INCLUDE (id, v1, v3, data1, data2, data3)

etc.

I should mention here that DB size is of concern so I don't really want to do the multiple indexes if I can help it.

I guess what this really boils down to is that I'm wondering if the query optimizer will use the one index that has all of the key columns listed in all of these different situations?

select id, a1, a2, a3, v1, v2, v3, data1, data2, data3, table_order
from example_table
where a1 = ?
and a2 = ?
and a3 = ?
order by table_order

select id, a1, a2, a3, v1, v2, v3, data1, data2, data3, table_order
from example_table
where a1 = ?
and a2 = ?
and a3 = ?
and v1 = ?
order by table_order

select id, a1, a2, a3, v1, v2, v3, data1, data2, data3, table_order
from example_table
where a1 = ?
and a2 = ?
and a3 = ?
and v1 = ?
and v2 = ?
order by table_order

etc.

Or am I forced into having to do multiple indexes to satisfy each condition?

Given that the where clause 90% of the time only have a1, a2, a3, should I just include the v columns in as non key columns because they're selected? Does the inclusion of the extra key columns hurt performance for the 90% use case?

Could specifying so many key columns in the order I'm looking to specify cause performance concerns on simple joins like;

inner join example_table
on a1 = ?

or would the query optimizer just not use this index whatsoever for that kind of simple join? If so, I'm going to need a separate index like the follow, right?

CREATE NONCLUSTERED INDEX
ON example_table(a1)

Best Answer

That's a great question.
And there are good answers.

The engine definitely will use the index even if you don't use every key column.
That's especially so if they are in order, as you are talking about.
(can anyone else speak to different orders of key columns?)

You will benefit just fine from selecting just on the first column alone as a key, or multiple columns.
What will make a difference - for any index - is staying inside the INCLUDEd columns.
No matter how many key columns you use in your Where, the performance hit for having to go back to the primary key for additional columns can be huge as it doubles the "operations".

When it comes to dealing with performance vs. size, you have the same problem as with any index.
Since you know you want the same columns returned in all cases, if you are READ focused, you will probably want to the index with all 6, if you INCLUDE everything.
It will certainly save you db size compared to making both indexes.

On WRITE, you obviously have a bigger burden with a larger index. That is a significant additional amount of sorting.
If you do just one row inserted at a time, maybe it won't hardly matter at all.
If you do bulk inserts, you'll definitely want to test the two indexes to see the write performance for your actual inserts.

Related Solutions

Sql-server – Where to place non clustered index

In there interest of showing a generalized example until we see your particular DDL and query, take the below as a basic example:

create table SomeTable
(
    id int identity(1, 1) not null
        primary key clustered,
    AnotherInt int null,
    SomeData nvarchar(1024) null
)
go

insert into SomeTable(AnotherInt, SomeData)
values(null, null)
go 1000

update SomeTable
set 
    AnotherInt = id * 3 + 152,
    SomeData = 'My ID value is ' + cast(id as nvarchar(16))

So we have a test table, SomeTable, with a clustered index. Execute the below query:

select SomeData
from SomeTable
where AnotherInt < 200

This causes a Clustered Index Scan, like you are seeing:

enter image description here

So analyzing the query, we have the column AnotherInt that is in the WHERE clause and being searched on. We are also retrieving the SomeData column, so to prevent a key lookup in the clustered index (or the optimizer may even just use a Clustered Index Scan again), we'll have SomeData as an INCLUDE column:

create nonclustered index IX_SomeData_AnotherInt_SomeData
on SomeTable(AnotherInt) 
include (SomeData)
go

Now, executing the same query above:

select SomeData
from SomeTable
where AnotherInt < 200

SQL Server will utilize that non-clustered index to return the data. It is a covering index, because it won't need to do a lookup on the clustered index for the remaining data:

enter image description here

Through creating a prudent NCI, we have now eliminated the CI Scan in lieu of an Index Seek.

How and what is the most efficient way to join two tables, retaining a particular field from both

It's not entirely clear what you are after but I think a FULL outer join would help:

SELECT 
    COALESCE(a.Component, b.Component) AS Component
  , COALESCE(a.data1, 0) AS data1
  , COALESCE(a.data2, 0) AS data2
  , COALESCE(b.data3, 0) AS data3
  , COALESCE(b.data4, 0) AS data4
FROM 
    table1 AS a
  FULL JOIN
    table2 AS b
      ON b.Component = a.Component ;

If Component is not UNIQUE on one (or both) of the tables, then you could aggreagte and then join:

WITH a AS
  ( SELECT 
        Component
      , COUNT(*) AS cnt
      , SUM(data1) AS sum_data1
      , SUM(data2) AS sum_data2
      -- ...
      , AVG(data1) AS avg_data1
      -- ...
    GROUP BY
        Component
    FROM
        table1
  )
  , b AS
  ( SELECT 
        Component
      , COUNT(*) AS cnt
      , SUM(data3) AS sum_data3
      -- ...
      , AVG(data3) AS avg_data3
      -- ...
    GROUP BY
        Component
    FROM
        table2
  ) 
    SELECT 
        COALESCE(a.Component, b.Component) AS Component
      , COALESCE(a.sum_data1,0) AS sum_data1
      -- ...
      , COALESCE(b.sum_data3,0) AS sum_data3
      -- ...
    FROM 
        a FULL JOIN b
            ON b.Component = a.Component ;

Best Answer

Related Solutions

Sql-server – Where to place non clustered index

How and what is the most efficient way to join two tables, retaining a particular field from both

Related Question