Sql-server – SQL Server 2008: Unused Clustered Index

index-tuningsql serversql-server-2008-r2

I have an MS SQL Server 2008 R2 database that contains a table that has just under 100 million rows & is both read and write heavy.

It contains Tweets, and the id of each Tweet is one of the fields, but it is not used as the primary key, instead it has a non-clustered index on it & there is a separate field for the primary key "localPkId" (also the clustered index) that is just an identity and isn't actually used anywhere (so all other tables will still join on the Tweet ID, ignoring the "localPkId" field).

The Tweets that get inserted will come into the system out of order, so by not using Tweet ID as the clustered index, it means new rows will always get put at the end, speeding up inserts.

This had been working fine for some time as the table had a predominantly write heavy workload, but now it's being used for a lot of reads as well and I'm concerned that as the workload continues to grow this will become a problem.

First question: Does having "localPkId" offer any advantage over the table being a heap/having no clustered index?

Second question: Should the Tweet ID be the Primary Key & Clustered Index & then set a low fillfactor? If so, how would I go about choosing an appropriate value for the fillfactor in a case like this?

Best Answer

But you have an index on TweetID. Is that index not used by the joins?

Even a non-clustered index will fragment with insert and update. As that index fragments inserts take longer. How are you managing fill factor and index maintenance on this non-clustered index?

An unused PK does nothing.

TweetID as PK will save some space. What % of rows get added daily. If only 1% rows get added daily you could use a fill factor of 90% and defrag daily or weekly. Start with like 80% or 90% and see how fast it fragments.

If the TweetID are out of order but in the same range then the problem you have there is fill factor gets hammered in that range and unused in the rest of the range. But a defag is faster as it is only cleaning up that part of the range. In that case you may be better off with 100% fill factor and a regular index reorganize.

There is no cut and dry answer. You need to manage and monitor the index for your data and your workload.

Related Solutions

Sql-server – SQL Server Primary key / clustered index design decision

You are correct to separate "clustered index" from "primary key":

A clustered index is the organisation of data on disk is better if
- narrow
- numeric
- increasing (strictly monotonic)
The primary key identifies a row

Note: GUIDs make poor clustering keys

In this case, with the surrogate column, the table has 2 candidate keys:

ProductHistoryID
ProductNo + CreatedDateTime

Assumed convention states that the ProductHistoryID becomes the PK, but you can leave the PK on (ProductNo, CreatedDateTime): it will just be non-clustered. Which leads to indexes:

clustered index should be on ProductHistoryID
unique non-clustered index on (ProductNo, CreatedDateTime)

Example

CREATE TABLE Product (
    ProductHistoryID int NOT NULL IDENTITY (1,1) NOT NULL,
    ProductNo ...
    CreatedDateTime ...

then you a choice of

    CONSTRAINT PK_Product PRIMARY KEY CLUSTERED (ProductHistoryID)
    CONSTRAINT UQ_Product UNIQUE NONCLUSTERED (ProductHistoryID)

    CONSTRAINT PK_Product PRIMARY KEY NONCLUSTERED (ProductNo, CreatedDateTime)
    CONSTRAINT PK_Product UNIQUE CLUSTERED (ProductHistoryID)

Also, the pattern you have is a "type 2 Slowly Changing Dimension"

Sql-server – How does SQL Server choose an index key for a foreign key reference

The (lack of) documentation suggests that this behaviour is an implementation detail, and is therefore undefined and subject to change at any time.

This is in stark contrast to CREATE FULLTEXT INDEX, where you have to specify the name of an index to attach to -- AFAIK, there is no undocumented FOREIGN KEY syntax to do the equivalent (though theoretically, there could be in the future).

As mentioned, it does make sense that SQL Server chooses the smallest physical index with which to associate the foreign key. If you change the script to create the unique constraint as CLUSTERED, the script "works" on 2008 R2. But that behaviour is still undefined and should not be relied upon.

As with most legacy applications, you'll just have to get down to the nitty-gritty and clean things up.

Best Answer

Related Solutions

Sql-server – SQL Server Primary key / clustered index design decision

Sql-server – How does SQL Server choose an index key for a foreign key reference

Related Question