Sql-server – Does the table need a primary key and clustered index change

clustered-indexoptimizationprimary-keysql-server-2008-r2

I have a table holding 1.7 Milion rows.

Definition:

CREATE TABLE T(
[ID] [uniqueidentifier] NOT NULL,
[AID] [uniqueidentifier] NOT NULL,
[BID] [uniqueidentifier] NOT NULL,
[iType] [int] NOT NULL,
[MT] [ntext] NOT NULL,
[isM] [tinyint] NOT NULL,
[CDate] [datetime] NOT NULL,
[CBy] [nvarchar](50) NOT NULL,
[xi] [tinyint] NOT NULL 

There is no Primary Key and no clustered index. There is one non clustered index on a uniqueidentifier column (that is not the PK candidate).

Looking at the usage statistics I see the following:

           Seeks Scans Lookups Updates TotalKB  UsedKB
HEAP       0     2     1500    65000   1810736  1757128
Non Cl.IX  1500  0     0       65000   56280    56240

I am thinking about adding a primary key on the (obviously previously for this purpose implemented) "ID" column – just to follow the normalization requirements and to have it.
I am also thinking if a clustered index would be helpful or not… There is not a really good candidate to fullfil the Clustered index requirements. Maybe CDate is the least bad one…?

Unfortunately I do not have insight in regulary running queries that I could use to identify if there are range queries and to compare before / after performance with.

Questions:

  1. After adding a primary key – can I expect any performance changes / improvements?
  2. Is it even possible to add a primary key WITHOUT adding a clustered index?
  3. Would you suggest adding a clustered index in this scenario to improve performance? I have read tons of opinitons, discussions and benchmark tests about this – Brent Ozar says "create it as long you can't prove it's better without" – others say (sorry, it is in German) performance usually is not getting better but ofthen worse having a clustered index in place because it requires more reads to get the required data, a benchmark on SE resulted in "Ci are good fur updates only" => so right now I am completely confused what to do with my several HEAP tables (like this one).

Best Answer

To answer question 2 first

2

A primary Key is NOT always the clustered index, it can be the clustered index and in the majority of cases is the way things are done, but it isn't always the best for your data. The culstered index is the order in which your data is physically stored on the disk whereas the primary key (which can be composite) designates the field to be a unique field and is beneficial for not inserting duplicate values and for foreign key lookups if you wish to do joins (In summary for 2, yes you can add a primary key without a clustered index)

NOTE: sometimes if you want an identifiable row you can add a new field to the table to simply act as the primary key (not always advisable, but can sometimes be the a solution to improve performance)

1

Adding a primary key can change performance, however the only true way to know if it's going to improve performance is to test it, if you have a pre-live environment consider adding it to there and running your queries across it.

If you have queries that run as joins to this table on ID,AID,BID all together (Sorry I'm not 100% sure how these are all coming together) the Potentially create a composite primary key across all three which means when anything wishes to get data with this table comparing all three of those it can find that row with ease. (Hope this makes sense)

3

Adding a Clustered Index completely depends on your data, once again a pre-live environment would be an ideal situation for testing.

A few things to consider when creating a clustered index, what data are you retrieving and what are you inserting

(this is a general example)If you're inserting and retrieving data that is the most recent data then a clustered index on the date field sounds the best idea, however if there is a LOT of data going in and out you will have very high contention on the most recent pages in your table, an alternative would be to have the clustered index around a category that those dates are on, eg client, this would mean that the data is grouped by a client which is more likely to have data gathered by, and spreads the read write load across the disk / disks

If the data retrieve is very random then a clustered index is quite pointless, if the data you get back has no real order to it then a Heap is completely acceptable.

Ultimately there is no be all answer to should I add a clustered index or a primary key because every situation is slightly different and will react in different ways.

If you have a pre-live environment (even a cut down version) can help make your decisions. Personally we have tables with primary clustered composite keys, and some tables that are simply heaps.

Hope this helps (and makes sense, sometimes I find I ramble)