Sql-server – SQL index with additional includes

sql server

In an effort to be more efficient when creating a new index, if there is already an index with the same index keys, but it simply needs additional include columns, is it better to create a second index (again, with the same keys) with different include columns? Or would it be more efficient to the query engine (not to recreate the index) to recreate the index and add the include columns to the original include column set?

If simply adding the include columns to the existing index, does it matter what order the include columns are appended?

Is there a scenario where it would make more sense to create a second index, duplicating the index keys, but with a different set of include columns?

I've attempted to simplify this question in code below.

    CREATE NONCLUSTERED INDEX [IX_Index_A] ON [dbo].[Table]
    ([Col1] ASC)
    INCLUDE ([Col2])

    CREATE NONCLUSTERED INDEX [IX_Index_B] ON [dbo].[Table]
    ([Col1] ASC)
    INCLUDE ([Col3],[Col4])

    DROP INDEX [IX_Index_A] ON [dbo].[Table]

    CREATE NONCLUSTERED INDEX [IX_Index_A] ON [dbo].[Table]
    ([Col1] ASC)
    INCLUDE ([Col2],[Col3],[Col4])

Best Answer

The included field order doesn't matter since there's no sorting involved - those fields are available at the leaf level of the index for every row.

There's not really a disadvantage to consolidating them like in your second example, except for page density on that index (more page reads per number of rows since there's more data stored for each row).

There ARE measurable advantages, though - for updates/inserts on Col1 you only need to update one index instead of two, and you will save a lot of space by not keeping duplicate key entries for col1 in two separate indexes.

The only scenario I can think of that would make sense to have the two narrower indexes is if one of those was designed specifically for a mission critical procedure or view. If you have an index that includes only what you need for a certain query that gets run with high frequency it would make sense to keep it as streamlined as possible.

Related Solutions

Sql-server – How to create implicit select statements

Not sure that this is simpler than a cursor solution, but here is what I would do:

Before we begin we need a few tables in a SQL Fiddle:

MS SQL Server 2012 Schema Setup

CREATE TABLE dbo.tbl1(c1 INT, c2 INT, c3 INT);
CREATE TABLE dbo.tbl2(c4 INT, c5 INT, c6 INT);
CREATE TABLE dbo.tbl3(c7 INT, c8 INT, c9 INT);

First we need the list of all user tables. For that we can use the sys.tables catalog view. The OBJECT_SCHEMA_NAME() function gets us the schema name, the OBJECT_NAME() function the table name. The QUOTENAME() function quotes the names correctly, in case some of those contain special characters, key words or spaces. (In this example the use of OBJECT_NAME() is not strictly necessary as sys.tables has a name column, but I left it in as you can use this pattern with any catalog view that has an object_id column.)

Query 1:

SELECT 
  QUOTENAME(OBJECT_SCHEMA_NAME(T.object_id))+'.'+
  QUOTENAME(OBJECT_NAME(T.object_id)) AS quoted_table_name,
  T.object_id
FROM sys.tables AS T;

Results:

| QUOTED_TABLE_NAME | OBJECT_ID |
|-------------------|-----------|
|      [dbo].[tbl1] | 245575913 |
|      [dbo].[tbl2] | 261575970 |
|      [dbo].[tbl3] | 277576027 |

The next step is to get the list of column names, again quoted. We can use the sys.columns catalog view for that.

Query 2:

SELECT C.name,C.column_id 
  FROM sys.columns AS C
 WHERE C.object_id = OBJECT_ID('dbo.tbl1');

Results:

| NAME | COLUMN_ID |
|------|-----------|
|   c1 |         1 |
|   c2 |         2 |
|   c3 |         3 |

The next hurdle is to get those columns in a comma separated list. There is no string concatenation aggregate function build in so we have to use a trick:

Query 3:

SELECT STUFF((
  SELECT ','+QUOTENAME(name)
    FROM sys.columns AS C
   WHERE C.object_id = OBJECT_ID('dbo.tbl1')
   ORDER BY C.column_id
     FOR XML PATH(''),TYPE
  ).value('.','NVARCHAR(MAX)'),1,1,'') AS clomun_list;

Results:

|    CLOMUN_LIST |
|----------------|
| [c1],[c2],[c3] |

With that all pieces are in place and we just have to put them all together:

Query 4:

SELECT 'SELECT ' +
       CL.column_list +
       ' FROM ' +
       QUOTENAME(OBJECT_SCHEMA_NAME(T.object_id)) + '.' +
       QUOTENAME(OBJECT_NAME(T.object_id)) +
       ';' AS select_statement
FROM sys.tables AS T
CROSS APPLY (
  SELECT STUFF((
    SELECT ','+QUOTENAME(name)
      FROM sys.columns AS C
     WHERE C.object_id = T.object_id
     ORDER BY C.column_id
       FOR XML PATH(''),TYPE
    ).value('.','NVARCHAR(MAX)'),1,1,'') AS column_list
  )CL;

Results:

|                         SELECT_STATEMENT |
|------------------------------------------|
| SELECT [c1],[c2],[c3] FROM [dbo].[tbl1]; |
| SELECT [c4],[c5],[c6] FROM [dbo].[tbl2]; |
| SELECT [c7],[c8],[c9] FROM [dbo].[tbl3]; |

This example will work in SQL 2005 and later, assuming you are on the latest service pack. There is no clean solution to achieve this with SQL Server 2000. In that case you need to go back to your cursor solution.

Sql-server – One index or two

And so enters the art of performance tuning and indexing strategies...

It seems logical to me to amend the existing index definition to include the suggested columns

I'm going to take your quote and write a third index definition:

create index [idx_index3]
on [table1] (col1, col2, col3)
include (col4, col5, col6....);

That should be the CREATE INDEX statement that corresponds to your quoted statement.

That very well may be a prudent solution, but it depends. Here are a couple of examples when I say that it depends.

If you have a common workload that mostly consists of queries like this:

select col1, col2, col3
from table1
where col1 = 1
and col2 = 2
and col3 = 3;

Then your idx_index1 index would be solid. Perfectly narrow, it's an index that satisfies that query with no extraneous data in it (not taking into account the clustered index definition, if one at all).

But if you have workload that consists of queries mainly like the following:

select co11, col2, col3, col4, col5
from table1
where col1 = 1
and col2 = 2;

Then idx_index2 would be wise, as it is what's called a covering index preventing the need for a key lookup back to the clustered index (or a RID lookup back to the heap). That nonclustered index definition would solely encompass all of the data that query needs.

With your recommendation, it would be well suited for a query like the following:

select co11, col2, col3, col4, col5
from table1
where col1 = 1
and col2 = 2
and col3 = 3;

Your idx_index3 recommendation would be a covering index that satisfies the search criteria for the above query.

The point I'm trying to get at, is in an isolated question like this we can't answer this definitively. It all depends on what the common and frequent workload is. Of course you could always define all three of these indexes to handle each sample query type, but then comes into question the maintenance that'll be required to keep these indexes updated (think: INSERTs, UPDATEs, DELETEs). That's the overhead of indexes.

You need to dissect and evaluate the workload, and determine where the advantages will be best in place. If the first sample query is the most common by far being executed dozens of times a second, and there is a very infrequent query like the third sample query, then it wouldn't make sense to bloat the leaf level pages of the index with the INCLUDE nonkey columns. It all depends on your workload.

If you understand prudent indexing strategies, and you understand your common workload, then by applying both of those you will be able to come up with what is the best route to take.

Best Answer

Related Solutions

Sql-server – How to create implicit select statements

Sql-server – One index or two

Related Question