Sql-server – Missing Non Clustered Index already part of Clustered Index

execution-planindexsql serversql-server-2016

I'm debugging a slow running query and in the execution plan a non-clustered index is suggested, with 51.6648 Impact. However, the non-clustered index only includes columns that are already in the Primary Key (PK) Composite Clustered Index.

Could this be because of the order of the columns in the index? i.e. if the columns in the clustered index are not in order from most selective to least then is there potential for a non-clustered index to improve performance?

In addition the non-clustered index only contains two of the three PK columns with the third added as an included column. Is the include another reason why use of the non-clustered index could be more optimal?

Below is an example of the table structures I am working with:

Tables-

Retailers (
    RetailerID int PK, 
    name ...)

Retailer_Relation_Types (
    RelationType smallint PK, 
    Description nvarchar(50) ...)

Retailer_Relations (
    RetailerID int PK FK, 
    RelatedRetailerID int PK FK, 
    RelationType smallint PK FK, 
    CreatedOn datetime ...)

The table Retailer_Relations has the following composite PK index and suggested index-

CONSTRAINT PK_Retailer_Relations 
PRIMARY KEY CLUSTERED (
    RetailerID ASC, 
    RelatedRetailerID ASC, 
    RelationType ASC
    ) ON [PRIMARY]

CREATE NONCLUSTERED INDEX <NameOfIndex> 
ON Retailer_Relations (
    RetailerID, 
    RelationType
    ) 
INCLUDE (
    RelatedRetailerID
    )

Best Answer

The table Retailer_Relations has the following composite PK index and suggested index-

While missing indexes could be helpful and could definitely work, I would not spend too much time on missing indexes, these hints are created on the estimated execution plan, not on the actual execution plan.

More precisely, these index hints are based on the premise of reducing the cost of Query Bucks™ used by operators in the plan. The optimizer calculates the estimated costs, and adds missing index hints accordingly.

As a result they could be very wrong. If you are unsure if it is going to help, the best thing to do is test the situation before and after. You could do this by adding the statement SET STATISTICS IO, TIME ON; before running the query.

Also, you could use statisticsparser to make it easier to read these statistics.

Could this be because of the order of the columns in the index?

That is correct, creating the missing index can improve the selectivity on queries, for example if your query looks like this:

SELECT  RelatedRetailerID
FROM Retailer_Relations 
WHERE
RetailerID = 5 AND
RelationType = 20;

or like this:

SELECT  RelatedRetailerID
FROM Retailer_Relations 
ORDER BY
RetailerID,
RelationType;

The reasoning behind this is that both indexes could seek on RetailerID, that part is not going to change. But what if extra filters/ordering is applied on RelationType? It would be all over the place in the clustered index, as a result of it being the third key value, not the second key value. And as we know, it is the second key value in the NCI.

Okay, but when or how would the nonclustered index improve the query?

A couple of cases could be:

  • If relationType filters a lot of values, the residual I/O could be high, resulting in the possible need of the nonclustered index (Query #1)
  • Ordering on the two columns occurs (One way), and the resultset is large (Query #2).
  • As @AaronBertrand mentioned: if the CI size difference compared to the NCI is of a considerable amount, adding the NCI will reduce the pages read by queries that benefit from it.

NCI Side note

As a side note, adding the key columns to the include list in your NCI is not exactly needed, since CI key columns are automatically included in all Non clustered indexes.

You could opt to do so if you are not sure if the clustered index will remain the same, and want the column to always be included.

Regarding the query itself, if you added the execution plan via PasteThePlan we could give some more information on indexing / improving the query.


Testing

Create table and add some rows

CREATE TABLE Retailer_Relations (
    RetailerID int , 
    RelatedRetailerID int , 
    RelationType smallint, 
    CreatedOn datetime,
    CONSTRAINT PK_Retailer_Relations 
PRIMARY KEY CLUSTERED (
    RetailerID ASC, 
    RelatedRetailerID ASC, 
    RelationType ASC
    ) ON [PRIMARY])


    DECLARE @I Int = 1
    WHILE @I < 1000
    BEGIN
    INSERT INTO Retailer_Relations(RetailerID,RelatedRetailerID,RelationType,CreatedOn)
    VALUES(@I,@I,@I,GETDATE()
    )
    set @I += 1
    END

Query #1

    SELECT  RelatedRetailerID
FROM Retailer_Relations 
WHERE
RetailerID = 5 AND
RelationType = 20;

Plan without index Here

While it is doing a seek, it is doing a seek on RetailerID. Afterwards it is issueing a residual I/O predicate on RelationType

Add the index

CREATE NONCLUSTERED INDEX IX_TEST
ON Retailer_Relations (
    RetailerID, 
    RelationType
    ) 
INCLUDE (
    RelatedRetailerID
    )

The residual predicate is gone, everything happens in a seek predicate, on both columns.

Execution plan

With the second query, the added index helpfullness becomes even more obvious:

SELECT  RelatedRetailerID
FROM Retailer_Relations 
ORDER BY
RetailerID,
RelationType;

Plan without the index, with a Sort operator:

enter image description here

Plan with the index, using the index removes the sort operator

enter image description here