Sql-server – Which columns to place the clustered index

clustered-indexsql serversql-server-2012

I have a system that has been up-and-running for almost 3 years now. Previously (and currently) the front-end was using NHibernate for all of the database access.

Right now, I'm in the process of converting over to Dapper and utilizing stored procedures for EVERYTHING (reads, writes, etc).

Now that NHibernate is gone, I don't believe the current structure of tablename, tablenameid (clustered primary key) is optimal anymore for every table. Lots of my tables will NEVER be accessed via the primary key, UNLESS it's for a delete (even that can be avoided).

So, here are a couple examples of table structures, and which fields I believe should be the clustered index.

Example 1:

Event table
------------------------------------------
EventId int primary key clustered,
Season int not null,
EventDate datetime not null,
EventName varchar(100) not null,
EventType varchar(10) not null,-- (soon to be it's own table, but not right now...there are only 2 possible values, Tournament or Dual)
SchoolId int not null,
OpponentId int NULL,
etc...

So currently, I have the clustered index on the primary key. This is one table that is accessed 99% of the time by Season, and SchoolId. (Very rare that the EventId is used, when it is, it's for a delete)

Here's where the trickiness comes in. If the EventType is 'Dual', then uniqueness is by Season, EventDate, SchoolId, OpponentId. If the EventType is 'Tournament', Then uniqueness is on Season, EventDate, SchoolId, EventName (OpponentId will be null).

Under this architecture, I don't believe I can/should have a unique key, which isn't really my issue…

Am I safe to assume that the clustered index for this table should be on the Season and SchoolId columns?

Example 2:

WrestlerRanking table
-----------------------------
WrestlerRankingId int primary key clustered,
Season int not null,
Week int not null,
IsCurrent bit not null,
WrestlerId int not null,
etc...

In this scenario, similar to the one above, 99% of the this table is accessed by Season, WrestlerId, and IsCurrent. Uniqueness can be set by Season, Week, and WrestlerId.

Would/should the clustered index for this table be on Season and WrestlerId, even though the majority of joins to this table will include the IsCurrent column? This is definitely not unique as there could be around 20 records for the Season and WrestlerId combination (20 different weeks of rankings).

Best Answer

Here's where the trickiness comes in. If the EventType is 'Dual', then uniqueness >is by Season, EventDate, SchoolId, OpponentId. If the EventType is 'Tournament', >Then uniqueness is on Season, EventDate, SchoolId, EventName (OpponentId will be >null).

I'd look at what's common between these two queries to determine what columns to use in the Clustered Index. Remember, the Clustered index is effectively how the data is arranged on the disk, so it should have a structure similar to the hierarchy that queries against the table will reference. If every query, or the most critical ones, all look at Season, SchoolId and EventDate regardless of EventType, then I'd use those in the Clustered index. Additional Non-clustered covering indexes can be created for specific queries (around EventType or OpponentId, for example) which INCLUDE the columns returned to prevent key lookups.

Clustered indexes do not need to be unique, they need to specify how the data should be arranged so that queries can quickly seek to the desired location. The primary key should be unique, but it doesn't have to be a clustered index. If it's just a reference for a row value that doesn't have any useful information that queries will use (i.e. you never join on it), then it should not be part of the clustered index.

For the second example, I would probably use Season, Week, and WrestlerId if those are most queried. The order should be in the logical heirarchy that the data is queried in...which brings me to my next point:

All this advice is contingent on the execution plans of your queries.

Always, ALWAYS, use actual execution information to determine what the best indexes for your queries are. Got some table scan issues? Implement a clustered index. Got a Clustered index scan slowing you down? Look into implementing a non-clustered covering index for that query based on the columns it's looking up (often suggested by the DB engine itself). Got key lookup? Try using an INCLUDES on the affected index with the looked-up columns. Indexing hypothetically, before you're super familiar with the DB engine, often leads to wasted time and rework. I'm not saying to shut your brain off and just blindly follow the query plan's suggested indexes, but instead use the query plan to build your indexing strategy rather than trying to preempt it, or second guess it.

Related Solutions

Sql-server – Which should I use on the DB? – Clustered Index or Non Clustered Index or Both

You may believe primary key is unnecessary, but in my experience it is critical. You may never use it. ~~But the database itself uses the primary key to determine PHYSICAL organization of the data.~~ Correction: I have been corrected here - in SQL Server, the physical organization is based on the clustering index. By default a primary key IS the clustering index, so by adding a primary key, the clustered index is added and the rest of my point is still correct. But if you don't want to use an autonumber or some other primary key as the clustered index, you CAN still use some other column to cluster. You should definitely pick one, though. Check out this link: http://technet.microsoft.com/en-us/library/ms186342.aspx

Without a ~~primary key~~ clustered index you risk corruption of your data. And in some cases, when it seems like totally unnecessary, adding a ~~primary key~~ that you don't even search on will dramatically improve query performance.

here is an excellent description of the basic indexes in SQL Server. https://www.simple-talk.com/sql/learn-sql-server/sql-server-index-basics/

Based on the basics you have given us, I would say you need something like a primary key (auto number works) that will be your clustered index, then several nonclustered indexes (on all three of your search fields), which are the typical index when you want to search on various columns that are not necessarily the primary key.

If data will be edited a lot (like when a software application is used to manage data), a lot of indexes (i.e., nonclustered indexes) will slow down insert/update/delete activity, because the indexes are updated when the table is. But if your focus is on selecting data, indexes are critical.

If this is updated in batch daily and is not part of a software system, you could have a routine which drops the indexes, does a batch insert, then recreates the indexes.

But regardless - what you want for your search fields are nonclustered indexes. Clustered indexes "cluster" an entire record around the primary key (or whatever field is chosen as the cluster basis). Nonclustered indexes are what are created when you do indexes on any other field.

Sql-server – Skewed Clustered Index

Primary Key and Clustered Index are really separate concepts and, although many tables put both of these attributes on the same constraint or index, this is not a requirement.

I do not see how changing the Clustered Index to the make it also the Primary Key will particularly help your performance.

It sounds like you have a Primary Key (BIGINT) already and that it is physically implemented through a unique non-clustered index. This should be good for fairly quick lookups of the Primary Key since it would be a relatively narrow index.

As you have it defined now, the skewed Clustered Index is mostly the same special non-NULL value, with about 17.5K rows.

Since you identify the Clustered Index as a likely pain point, you might look into whether a Filtered Index or an INDEXED VIEW would provide the needed performance improvement.

Sample of a Filtered View based on MSDN, like this:

CREATE NONCLUSTERED INDEX YourNonClusteredFilteredIndex
    ON dbo.YourTable (ClusterKeyCol)
    WHERE ClusterKeyCol IS NOT NULL; -- Or <> {special value}
GO

Sample of an Index View based on MSDN, something like this:

SET NUMERIC_ROUNDABORT OFF;
SET ANSI_PADDING, ANSI_WARNINGS, CONCAT_NULL_YIELDS_NULL, ARITHABORT,
    QUOTED_IDENTIFIER, ANSI_NULLS ON;
GO
CREATE VIEW dbo.YourClusteredView
WITH SCHEMABINDING
AS
    SELECT ClusterKeyCol, PrimaryKeyCol
    FROM dbo.YourTable
    WHERE PrimaryKeyCol <> {special value};
GO
--Create an index on the view.
CREATE UNIQUE CLUSTERED INDEX YourClusteredIndex 
    ON dbo.YourClusteredView (ClusterKeyCol);
GO

If you have a more detailed problem description, as John M asked, then perhaps some more targeted help can be provided.

Best Answer

Related Solutions

Sql-server – Which should I use on the DB? – Clustered Index or Non Clustered Index or Both

Sql-server – Skewed Clustered Index

Related Question