How to Add an Identity Field as New Cluster in SQL Server

clustered-indexidentitysql-server-2012

I have a very large table (88m rows, nearly 300GB in size) that has a poorly planned clustered index. I want to add a BIGINT identity field, and make it the new clustered index (having the desirable attributes of unique, narrow, static, and ever-increasing).

The problem is that I also want this new identity field to be ordered in a certain way on the current records (based on an existing date field that is not the current cluster). This is so when we purge the oldest records, those will all be contiguous on disk. (The date field isn't unique, though, so isn't a good candidate for the cluster by itself.)

How can I get the values for this new identity field ordered the way I want?

The first idea I had was to make a new table with the identify field, and just INSERT INTO rows from the current table ORDER BY the date field? Would that work?

If I don't have enough disk space to do that, is there any way to do that "in-place"? Perhaps by changing the clustered index to the date field (wait a long time), then add the identity (and wait a long time again)? Are the values of the identity guaranteed to be ordered by the current cluster?

Best Answer

I just performed the following experiment on a SQL Server 2012 box:

CREATE TABLE rdtest (label varchar(100), value int);

INSERT INTO rdtest VALUES
       ('one', 1)
      ,('two', 4)
      ,('three', 9)
      ,('four', 16)
      ,('five', 25)
;

ALTER TABLE rdtest ADD test_id int IDENTITY(1,1);

SELECT * FROM rdtest;

Result:

label    value       test_id
-------- ----------- -----------
one      1           1
two      4           2
three    9           3
four     16          4
five     25          5

Next:

ALTER TABLE rdtest DROP COLUMN test_id;

ALTER TABLE rdtest ALTER COLUMN value int NOT NULL;
GO

ALTER TABLE rdtest ADD CONSTRAINT PK_rdtest PRIMARY KEY ([value] DESC);

ALTER TABLE rdtest ADD test_id int IDENTITY(1,1);

SELECT * FROM rdtest;

Results:

label    value       test_id
-------- ----------- -----------
five     25          1
four     16          2
three    9           3
two      4           4
one      1           5

And, one more test run:

ALTER TABLE rdtest DROP CONSTRAINT PK_rdtest;

ALTER TABLE rdtest DROP COLUMN test_id;

ALTER TABLE rdtest ALTER COLUMN label varchar(100) NOT NULL;
GO

ALTER TABLE rdtest ADD CONSTRAINT PK_rdtest PRIMARY KEY ([label] ASC);

ALTER TABLE rdtest ADD test_id int IDENTITY(1,1);

SELECT * FROM rdtest;

Results:

label    value       test_id
-------- ----------- -----------
five     25          1
four     16          2
one      1           3
three    9           4
two      4           5

I suppose this isn't a guarantee that the clustered index dictates the order a new IDENTITY column gets filled in - but, I'd say it strongly indicates that it's very likely.