Sql-server – Non-clustered index covering a clustered index

clustered-indexindexindex-tuningsql server

I have a table that has the following indexes:

IXC clustered index that includes a single column : A asc.

IX1 that includes two columns : A asc, B asc.

In this case ICX seems like a duplicate index of IX1. Is it a good idea to delete ICX and make IX1 the clustered index in that case? Or is there scenarios where this would be a bad idea?

DDL example :

CREATE TABLE [dbo].[foo] (
    [A]      NVARCHAR (20)  NOT NULL,
    [B]      INT           NULL,
    [C]      NVARCHAR (20)  NOT NULL,
    [D]      NVARCHAR (20)  NOT NULL,
    [E]      NVARCHAR (20)  NOT NULL,
);


GO
CREATE CLUSTERED INDEX [IXC]
    ON [dbo].[foo]([A] ASC);

GO
CREATE NONCLUSTERED INDEX [IX1]
    ON [dbo].[foo]([A] ASC, [B] ASC);

Best Answer

Take a look at this article on SQLShack which gives a great description of what to consider when selecting clustered index keys. To summarise:

  • Keep it short - the clustered index key is included in all non-clustered indexes, so a wide key means more data space required on disk, and in the buffer when reading pages.
  • Keep it static - If your clustering key values are constantly changing, SQL will be forever re-sorting the clustered index to insert in the correct place.
  • Use an incrementing value - An IDENTITY column or sequential value helps speed up the INSERT processes by ensuring the new values are always being added to the end of the index, rather than somewhere in the middle.
  • Make it Unique - If the key column(s) are not unique, then SQL Server will add a uniqueifier. This only adds more space and requires more effort by the DB engine to maintain the index.
  • Sorting - The clustering key also defines the sort order for the data as it is stored. This means picking a clustering key that can or will be used reasonably frequently for sorting will only aid performance down the track and eliminate costly sort operators in query plans. NOTE: Character fields aren't great for sorting performance, so your current clustering key (A) doesn't fare well on this point.

So how do we apply this to your example?

You need to answer the following:

  • Is A+B short enough for a key? Probably, because just A is <= 22 bytes, so A+B is only <=26 bytes. Unless we're talking billions of rows, this is probably fine.
  • Is it static? Only you can answer this. Does the value in A or B change frequently? If so, then they're probably not good candidates for the clustering key.
  • Is A or B an incrementing value? I'm guessing A is unlikely to be an incrementing value as it is an NVARCHAR field, however, B might be since it is an INT.
  • Is A unique? Is B unique? Is A + B unique? If none of those is a yes, then probably not suitable as a clustering key.
  • What order do your queries normally sort in? If its A or B or A+B then these columns may be a good clustering key.

Once you've answered those questions, you should know if A or A+B is the better choice for a clustering key. If A or A+B still seem like equally valid choices for a clustering key, then considering the fact that you have an NCI with A + B would indicate you have a reasonable number of queries using both A and B as a predicate. Given that and the conditions above, A+B is likely a good candidate for a clustering key over just A if they equally satisfy the points from above.