Clustered Index / Index Oriented Table performance in join, worse than nonclustered

clustered-indexjoin;nonclustered-indexperformance

Can a clustered index (or IOT in Oracle) be detrimental, when to be used on a very "broad" table, but only few columns are used? In this case, the "Product" table is used only like a junction table between "ProductCategory" and "Sales".

If there was a nonclustered index on pr.ID and pr.CategoryID, the DBMS would do an Index-only-check, which has a very good performance. But, if I am right, a clustered index actually IS the entire table, ordered by the index columns. So, even if the clustered index had pr.ID and pr.CategoryID as it's index columns, the database would still have to load the entire table with all the heavy nvarchar(4000/max) stuff, only for two small columns.

-- get total sales amounts for product categories
SELECT pc.ID, pc.Name, SUM(sl.Amount) AS TotalSalesAmount 
FROM ProductCategory pc
INNER JOIN Product pr ON pc.ID=pr.CategoryID
INNER JOIN Sales sl ON pr.ID=sl.SoldProductID
GROUP BY pc.ID, pc.Name

with Product being a heavy table like this:

CREATE TABLE Product 
(
    ID int not null PRIMARY KEY, -- clustered index 1st column
    CategoryID int FOREIGN KEY REFERENCES ProductCategory(ID), -- clustered idx 2nd.
    Name                 nvarchar(200)
    RecommendedPrice     decimal(11,2),
    Creator              nvarchar(200),
    SafetyReport         nvarchar(4000),
    FutureDevelopmentsProposal nvarchar(4000),
    ExpectedSalesSurvey  nvarchar(4000),
    CrystalBallVision    nvarchar(4000), -- nonsense to represent a bloated table
    TarotCardsResult     nvarchar(4000),
    Horoscope            nvarchar(4000),
    FortuneTellerReport  nvarchar(max)
)

One remarkable thing I found out with a similar query on SQL Server 2008 R2:
The query plan contained an index scan on a completely unrelated, nonclustered index, like one on the pr.RecommendedPrice column only.

My idea is that the unrelated, nonclustered index contains references to the clustered index rows (pr.ID, pr.CategoryID), and it's cheaper to get these from a nonclustered index scan, rather than from the actual clustered index.

Am I right in my assumptions?

Best Answer

You are correct Erik. In case a large portion of the leaf level of the clustered index needs to be read, the size of the other columns affects the amount of data that needs to be read, since the leaf level of a clustered index contains the table pages.

Nonclustered indexes contain the clustered index values for the ability to perform a lookup when a column that is not in the nonclustered index needs to be fetched. The optimizer can leverage this in order to fetch the clustered index values from there when it decided it's cheaper to do that.