Sql-server – Skewed Clustered Index

clustered-indexperformancequery-performancesql server

I have a SQL Server database that has a pretty important table (join wise) that contains just over 2.1 million rows. For the VARCHAR column that defines the clustered index, 17.5k rows contain unique values, the remaining rows all contain exactly the same non null value. There are some queries that start at this table restricting by this column, expecting to get just one row, so that probably helps performance, but for many queries we are accessing via other fields and nearly always joining by a unique, non nullable (BIGINT) primary key (which is obviously non clustered).

I guess my question is, in light of anything else is it a no-brainer to change the clustered index to be the primary key (in light of there not being a more suitable candidate). Is there any check that I could do to try and predict the outcome (spinning up an exact replica and testing in a real world way is not an option)

I know that any changes I make will require a table rebuild (if I'm changing the clustered key)

Best Answer

Primary Key and Clustered Index are really separate concepts and, although many tables put both of these attributes on the same constraint or index, this is not a requirement.

I do not see how changing the Clustered Index to the make it also the Primary Key will particularly help your performance.

It sounds like you have a Primary Key (BIGINT) already and that it is physically implemented through a unique non-clustered index. This should be good for fairly quick lookups of the Primary Key since it would be a relatively narrow index.

As you have it defined now, the skewed Clustered Index is mostly the same special non-NULL value, with about 17.5K rows.

Since you identify the Clustered Index as a likely pain point, you might look into whether a Filtered Index or an INDEXED VIEW would provide the needed performance improvement.

Sample of a Filtered View based on MSDN, like this:

CREATE NONCLUSTERED INDEX YourNonClusteredFilteredIndex
    ON dbo.YourTable (ClusterKeyCol)
    WHERE ClusterKeyCol IS NOT NULL; -- Or <> {special value}
GO

Sample of an Index View based on MSDN, something like this:

SET NUMERIC_ROUNDABORT OFF;
SET ANSI_PADDING, ANSI_WARNINGS, CONCAT_NULL_YIELDS_NULL, ARITHABORT,
    QUOTED_IDENTIFIER, ANSI_NULLS ON;
GO
CREATE VIEW dbo.YourClusteredView
WITH SCHEMABINDING
AS
    SELECT ClusterKeyCol, PrimaryKeyCol
    FROM dbo.YourTable
    WHERE PrimaryKeyCol <> {special value};
GO
--Create an index on the view.
CREATE UNIQUE CLUSTERED INDEX YourClusteredIndex 
    ON dbo.YourClusteredView (ClusterKeyCol);
GO

If you have a more detailed problem description, as John M asked, then perhaps some more targeted help can be provided.