Sql-server – Excluding clustered key columns from non-clustered indexes definitions

indexsql serversql server 2014sql-server-2012

As I've read the bookmark or the reference of the non-clustered index to the table clustered index is the clustered index key itself. On the SQL Saturday event I've attended, one of the lecturers said it's always good practice to exclude the clustered key columns from the non-clustered index key definition and include clause, too.

I am wondering are there any negatives of doing so, for the following scenario:

  1. Table A has the following columns: RecordID, QuestionID, Pts, PtsOf and many other.
  2. I have created a non-clustered index like this:

    CREATE NONCLUSTERED INDEX [IX_TableA_RecordID_QuestionID] ON [dbo].[TableA]
    (
        [RecordID] ASC
       ,[QuestionID] ASC
    )INCLUDE ([Pts],[PtsOf]);
    

    in order to skip reading the clustered index when score is calculated per records and questions. The [RecordID][QuestionID] pair is the clustered index key.

If I change the index like this:

CREATE NONCLUSTERED INDEX [IX_TableA_RecordID_QuestionID] ON [dbo].[TableA]
(
    [Pts] ASC
   ,[PtsOf] ASC
);

The indexes is again used as before (it satisfies the search criteria) even for queries that are not referring the points columns:

SELECT [RecordID]
      ,[QuestionID]
FROM [dbo].[TableA];

SELECT [RecordID]
FROM [dbo].[TableA];

SELECT [QuestionID]
FROM [dbo].[TableA];

So, the engine is smart enough to see using the non-clustered index will reduce the reads and I know leaving it as the original definition is working, too, but some of the indexes are quite big and I think removing some of the keys can reduce their size.

What I am concerned about is that the first index rows were ordered by the bookmark and now they are not. I am wondering if and how this could harm the performance?

Best Answer

On the SQL Saturday event I've attended, one of the lecturers said it's a always good practice to exclude the clustered key columns from the non-clustered index key definition and include clause, too.

I disagree. Let me explain with the table in your question:

The clustered index is: (RecordID, QuestionID) and there are many more columns.

Any non-clustered index will also have the clustered key columns, too, appended in the end. So, an index like:

(Pts)   is equivalent to:   (Pts, RecordID, QuestionID)

and similarly:

(Pts, PtsOf)           <->:  (Pts, PtsOf, RecordID, QuestionID)

(Pts, RecordId)        <->:  (Pts, RecordID, QuestionID)

(Pts, QuestionID)      <->:  (Pts, QuestionID, RecordID)

(Pts) INCLUDE (PtsOf)  <->:  (Pts, RecordID, QuestionID) INCLUDE (PtsOf) 

(QuestionID, RecordID) <->:  (QuestionID, RecordID)

For joint tables or like this one that have a composite primary/unique key - whether it is clustered or not, it's very often to have queries that need the (a,b) index and others that will use better the (b,a) index, sometimes queries that need both. So, one often needs both of these.

If the composite clustered key has more than two columns - say (a,b,c) - it's often that you may need an index on (b,c,a) or in (d,b) and another on (e,c,a) (which of course will be equivalent to (b,c,a), (d,b,a,c) and (e,c,a,b) respectively.) You can't just remove these columns from the definitions of the non-clustered keys because the column order will change.

The suggestion has one good point though. The clustered key columns can be removed from the INCLUDE part. They are redundant, just noise there.


About the indexes in the question, a non-CI on (Pts, PtsOf) is equivalent to (Pts, PtsOf, RecordID, QuestionID), so it is very different than the original non-CI on (RecordID, QuestionID) INCLUDE (Pts, PtsOf). It will use a bit more space than the original and of course these two indexes will be useful for different types of queries.

  • The (Pts, PtsOf) will be be useful, for example, for queries with WHERE Pts BETWEEN 0 AND 100, WHERE Pts = 200 AND PtsOf = 300, etc.

  • The (RecordID, QuestionID) INCLUDE (Pts, PtsOf) is basically a copy of the table with only the 2 clustered key columns and 2 only extra columns (of the many). This is (rarely) useful and it's a form of vertical partitioning. If you often have queries that need all the rows of the table but only these 2 columns, then it's probably one these (rather rare) cases where the extra space and effort to maintain this index is justified.