Clustered Index / Index Oriented Table performance in join, worse than nonclustered

clustered-indexjoin;nonclustered-indexperformance

Can a clustered index (or IOT in Oracle) be detrimental, when to be used on a very "broad" table, but only few columns are used? In this case, the "Product" table is used only like a junction table between "ProductCategory" and "Sales".

If there was a nonclustered index on pr.ID and pr.CategoryID, the DBMS would do an Index-only-check, which has a very good performance. But, if I am right, a clustered index actually IS the entire table, ordered by the index columns. So, even if the clustered index had pr.ID and pr.CategoryID as it's index columns, the database would still have to load the entire table with all the heavy nvarchar(4000/max) stuff, only for two small columns.

-- get total sales amounts for product categories
SELECT pc.ID, pc.Name, SUM(sl.Amount) AS TotalSalesAmount 
FROM ProductCategory pc
INNER JOIN Product pr ON pc.ID=pr.CategoryID
INNER JOIN Sales sl ON pr.ID=sl.SoldProductID
GROUP BY pc.ID, pc.Name

with Product being a heavy table like this:

CREATE TABLE Product 
(
    ID int not null PRIMARY KEY, -- clustered index 1st column
    CategoryID int FOREIGN KEY REFERENCES ProductCategory(ID), -- clustered idx 2nd.
    Name                 nvarchar(200)
    RecommendedPrice     decimal(11,2),
    Creator              nvarchar(200),
    SafetyReport         nvarchar(4000),
    FutureDevelopmentsProposal nvarchar(4000),
    ExpectedSalesSurvey  nvarchar(4000),
    CrystalBallVision    nvarchar(4000), -- nonsense to represent a bloated table
    TarotCardsResult     nvarchar(4000),
    Horoscope            nvarchar(4000),
    FortuneTellerReport  nvarchar(max)
)

One remarkable thing I found out with a similar query on SQL Server 2008 R2:
The query plan contained an index scan on a completely unrelated, nonclustered index, like one on the pr.RecommendedPrice column only.

My idea is that the unrelated, nonclustered index contains references to the clustered index rows (pr.ID, pr.CategoryID), and it's cheaper to get these from a nonclustered index scan, rather than from the actual clustered index.

Am I right in my assumptions?

Best Answer

You are correct Erik. In case a large portion of the leaf level of the clustered index needs to be read, the size of the other columns affects the amount of data that needs to be read, since the leaf level of a clustered index contains the table pages.

Nonclustered indexes contain the clustered index values for the ability to perform a lookup when a column that is not in the nonclustered index needs to be fetched. The optimizer can leverage this in order to fetch the clustered index values from there when it decided it's cheaper to do that.

Related Solutions

Sql-server – Is a clustered index locked while a (clustered index) scan is in progress

There are many many many more factors at play.

isolation level. locking behavior differs wildly between isolation levels. Some don't lock at all (read uncommitted, snapshot, rcsi). The default read committed transiently locks rows it reads as long as is necessary. Repeatable read and Serializable hold on to locks and end up locking everything, and many developers deploy serializable without ever realizing they do so.
scan purpose. Scans for read are compatible with each other so they don't block, no matter what they lock. If you have scans blocking other scans they must be scans for update, which are incompatible with each other. Again, snapshot isolation level (including rcsi) do not block even when doing a scan for update.
lock granularity, based on cardinality estimates. Scans may choose row, page or rowset level granularity. A table scan will likely choose page locks.
index page locks/row locks configuration, which you so inadvertently changed. You should revert the change, since is not based on any measurement and root cause analysis. Guts feeling has no role in investigation.
lock escalation
row stability requirements when off-row LOB data is present
other

I suggets you follow the procedure described in Capturing wait stats for a single operation to capture the wait stats of the scans you observed as blocking. Se if indeed they block on locks held by the table scan operation. IF the scenario is truly as you described (read scan vs. other read operations) then there is no reason for blocking so something else will be at play. You can also give sp_whoisactive a shot.

Sql-server – Clustered Table Scan Because of “SELECT *”

If you need columns in the output that aren't covered by the index, the optimizer has to make a choice:

Perform a table / clustered index scan (therefore all columns are there)
Perform a seek, then perform lookups to retrieve the columns not covered

Which way it will choose depends on a variety of things, including how narrow the index is, how many rows match the predicate, etc. You can force a seek with the FORCESEEK hint, but I suspect it will end up performing the same or worse than the scan SQL Server has chosen in your case.

Some options:

Change the app to run a proper query. I listed this first for a reason.
Create a view that selects only the columns you need:
```
CREATE VIEW dbo.myview
WITH SCHEMABINDING
AS
  SELECT col1, col2, col3 FROM dbo.tablename;
```
Then you can change the app to SELECT * from this view. Or you can get even more creative and rename the original table, and change the name of this view to what the name of the table used to be. Breaking change, obviously; proceed with caution.
Add all of the other columns to the key or INCLUDE list for the index. If these are hard-coded values and always the ones used, you may consider a filtered index.

Best Answer

Related Solutions

Sql-server – Is a clustered index locked while a (clustered index) scan is in progress

Sql-server – Clustered Table Scan Because of “SELECT *”

Related Question