Sql-server – Identity column as clustered index bad idea

clustered-indexidentitysql server

Its just been a few months of programming in SQL Server for me so my knowledge is not good in many regards. In an already existing project at work I came across many tables with large composite primary keys with clustered index. From what I have gathered, a large column/composite column with clustered index hits performance very hard, and at times the logical solution is an identity column. But at the same time I have come across many people flaming the over-usage of identity columns.

But I have never came across an example where identity column is a bad idea.

Recently we have standardized that every table should have an identity column as the clustered index – whether we use it as PK or not, as we require it for some export purposes.

So I would like some examples, in real life scenarios, where using an identity column as a clustered index is a bad idea.

Though at times it makes our life easy I have never encountered a scenario where it will be considered bad.

PS: I think my question is a bit naive but it is bugging me so much so I had to ask about it.

Best Answer

I usually use an identity column as clustered primary key. However in some (rare?) cases this is not ideal because of the LastPageInsertLatchContention. This happens if a table is heavely filled with data. Because of the identity key all this INSERT's wants to write the last page of the table (index). So this page can be locked and the performance may be better with another solution.

See

for details.

Related Solutions

Sql-server – Clustered Index Help, where did i go wrong

Are there exceptions to the above clustered index rules?

The general guidance for selecting a clustered index is good, but there are sometimes additional considerations to be taken into account. Occasionally, these extra factors can be more important than the general 'rules'.

Your scenario is somewhat "special" in that you have a very wide table, with a set of queries that request a (presumably) unpredictable set of columns, although the query predicates are normally the same.

The original clustered index arrangement was likely expensive for operations that change data, since changing any part of the clustering key means changing all nonclustered indexes as well. In addition, physically moving whole rows is costly in terms of log generation, especially where a page split occurs.

That said, once a clustered index has been modified and split a fair bit, it will contain quite a bit of free space, making future movements less intensive, just as if a sensible fill factor had been set in the first place, and maintained as part of normal index maintenance.

It seems likely your production system had fallen into a sort of equilibrium, where page splitting occurred at a steady, reasonable rate. Nonclustered index maintenance would still be relatively expensive, but it seems that was not a dominant factor.

The crucial advantages of the original indexing arrangement were:

The clustered index (Company, Region, FlagA, FlagB, FlagC) matches the predicate of:
```
SELECT {unpredictable column list}
FROM table 
WHERE Company = 'company' 
AND Region = 'region' 
AND flagA= 'A'
AND flagB = 'B'
AND flagC = 'C';
```
...while at the same time providing access to whichever columns are listed in the select list.
Queries of the form:
```
SELECT * 
FROM table 
WHERE ID = 1234;
```
...were adequately supported by the nonclustered primary key. This query always returns a single row, so only a single lookup via the clustered index is required after the row is located in the nonclustered index.

Changing to a clustered index on ID, and a nonclustered index on (Company, Region, FlagA, FlagB, FlagC) makes the second query a little more efficient (eliminating one lookup per query), but makes the first query much less efficient (replacing zero lookups with ~5000).

Moreover, it is very possible the optimizer might choose not to use the nonclustered index at all, estimating that a full scan of the table would be cheaper than the 5000 lookups.

You're probably better off leaving the two main indexes as they are for the time being, while you sort out the 40-odd nonclustered indexes, and analyse the workload for the minimal set of covering nonclustered indexes that would be required. Once that data is available, you will be in a better position to consider fundamental indexing changes.

You may also want to check the existing monitoring and maintenance routines for this table. In many scenarios, the 'rule-breaking' clustered index will be just fine, if the table is given a fill factor that just prevents significant page splitting before the next maintenance window comes around.

Sql-server – Why is removing the Identity property on a column not supported

Your question is, essentially:

Why can I no longer do this risky thing that I should never have been allowed to do in the first place?

The answer to that question is largely irrelevant (though you can see some Microsoft comments in these Connect items asking for this functionality: #294193 and #252226). For completeness, my synopsis is: The ability to remove the identity property was an unintended side effect of having the ability to mess with the system tables in the first place. This was not intended to be used in the many ways that it was, often with very bad consequences, and thus it was removed. It was an undocumented, unsupported, system table hack. The ability to change data in system tables wasn't removed because Microsoft no longer wanted you to hack your way out of a column being an identity column, it was removed because mucking with the system tables is extremely risky. Removing the IDENTITY property itself wasn't a specifically targeted feature removal, and I would have never fully trusted this approach even back in the ancient days when it was possible.

That said, how about we answer this question instead?

How do I remove the IDENTITY property of a column with minimal or no downtime?

This you can do easily, using ALTER TABLE ... SWITCH, a technique I am certain I first learned from our own Paul White in the workarounds for Connect #252226. Quick example, given this simple table:

CREATE TABLE dbo.Original
(
  ID INT IDENTITY(1,1) PRIMARY KEY,
  name SYSNAME
);
GO

INSERT dbo.Original(name) VALUES(N'foo'),(N'bar');
GO

SELECT * FROM dbo.Original;
GO

Results:

ID  name
--  ----
1   foo
2   bar

Now, let's create a shadow table, and switch to it, then drop the old table, rename the new one, and then resume normal activity:

CREATE TABLE dbo.New
(
  ID INT PRIMARY KEY,
  name SYSNAME
);
GO

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
  ALTER TABLE dbo.Original SWITCH TO dbo.New;
  DROP TABLE dbo.Original;
  EXEC sys.sp_rename N'dbo.New', N'Original', 'OBJECT';
COMMIT TRANSACTION;
GO

INSERT dbo.Original(ID,name) VALUES(3,N'splunge');
UPDATE dbo.Original SET ID = 6 WHERE ID = 1;
GO

SELECT * FROM dbo.Original;
GO

Results:

ID  name
--  -------
2   bar
3   splunge
6   foo

Now clean up:

DROP TABLE dbo.Original;

This is a metadata operation only, with no data movement, and will only block other users while the metadata is being updated. But, admittedly, it is a very simplistic example. If you have foreign keys or are using other features like replication, Change Data Capture, Change Tracking, etc. you may need to disable or remove some of those before making this change (I haven't tested all combinations). For foreign keys specifically, see this tip which shows how to generate scripts to drop and re-create all (or selected) foreign key constraints.

Additionally, you will need to update your application code to not expect SQL Server to populate this column, and check any insert or select statements that may be depending on the order of columns or the columns they need to specify. Generally, I would grep your entire code base for any mention of this table.

Also see this script from Itzik Ben-Gan (source: this ancient article) for another way to handle this, but there is data movement involved here, so it doesn't deliver on the "no or minimal downtime" requirement.

Best Answer

Related Solutions

Sql-server – Clustered Index Help, where did i go wrong

Sql-server – Why is removing the Identity property on a column not supported

Related Question