Sql-server – Replacing composite key with surrogate

database-designfragmentationprimary-keysql serversurrogate-key

I have the following table that has a couple of million rows in it and is 99% fragmented virtually all of the time. My plan was to insert a IDENTITY field as a surrogate key to replace the current composite 6 field primary, then make the current key a unique key for referential integrity and recreate the indexes.

    CREATE TABLE [dbo].[Autocompleter](
        [CountryId] [int] NOT NULL,
        [ProvinceId] [int] NOT NULL,
        [LocationId] [int] NOT NULL,
        [PlaceId] [int] NOT NULL,
        [EstabId] [int] NOT NULL,
        [LocaleId] [int] NOT NULL,
        [Title] [varchar](400) NOT NULL,
        [Hotels] [int] NULL,
        [AlternateTitles] [varchar](4000) NULL,
        [EnableHotels] [bit] NOT NULL,
        [EnableHolidays] [bit] NOT NULL,
        [DisplayPriority] [int] NOT NULL,
     CONSTRAINT [PK_autocompleter_1] PRIMARY KEY CLUSTERED 
    (
        [CountryId] ASC,
        [ProvinceId] ASC,
        [LocationId] ASC,
        [PlaceId] ASC,
        [EstabId] ASC,
        [LocaleId] ASC
    )

Any gotchas I should be looking out for ? if it is an identity field I am thinking this should not break code that inserts into the table (as long as it specifies the columns explicitly)

I plan to create a new clustered index on the surrogate key and then make the current clustered index on the 6 fields a NC index.

Best Answer

Reading up on Kimball dimensional modeling, use of a surrogate key, in particular an IDENTITY, will help reduce fragmentation caused by page splits as you'll be appending rows at the end of leaf pages vs. attempting to insert them in the middle if the keys are not in the order of the index (ascending or descending depending on how the index was defined).

But, if the surrogate key will not be used in joins with other tables or in WHERE criteria, use of the surrogate key as a clustered index might not provide any additional benefit after the initial data import. If you keep the composite clustered key, sorting the source data in the order of the six-column composite key prior to import should avoid the page split fragmentation you're witnessing.

As for a choice of clustered index, the best candidate may be the column(s) that would be used in your most common/critical queries for WHERE criteria or joins. Use of a NCI will require a bookmark lookup to obtain the remaining values in the table if needed in the resultset.

Related Solutions

SQL Server – Unable to Drop Non-PK Index Referenced in Foreign Key Constraint

Because a foreign key can point to a primary key or a unique constraint, and whoever created that foreign key possibly created it before the primary key existed (or they shifted the FK to point to the Unique index while they changed something else about the primary key). This is easy to repro:

CREATE TABLE dbo.MyTable(MyTableID INT NOT NULL, CONSTRAINT myx UNIQUE(MyTableID));

CREATE TABLE dbo.OtherTable1(ID INT FOREIGN KEY REFERENCES dbo.MyTable(MyTableID));

ALTER TABLE dbo.MyTable ADD CONSTRAINT PKmyx PRIMARY KEY(MyTableID);

CREATE TABLE dbo.OtherTable2(ID INT FOREIGN KEY REFERENCES dbo.MyTable(MyTableID));

In fact, both of these foreign keys will point to the first unique constraint defined on that column (myx).

You can fix the foreign key on the other table by dropping it and re-creating it. You will need to repeat that process for any other tables that point to this column. You can find these easily:

SELECT s.name,t.name,fk.name
FROM sys.foreign_key_columns AS fkc
INNER JOIN sys.foreign_keys AS fk
ON fkc.constraint_object_id = fk.[object_id]
INNER JOIN sys.tables AS t
ON fkc.parent_object_id = t.[object_id]
INNER JOIN sys.schemas AS s
ON t.[schema_id] = s.[schema_id]
INNER JOIN sys.columns AS c1
ON c1.[object_id] = fkc.referenced_object_id
AND c1.column_id = fkc.referenced_column_id
AND c1.name = N'MyTableID'
WHERE fkc.referenced_object_id = OBJECT_ID('dbo.MyTable');

Results:

dbo    OtherTable1    FK__OtherTable1__ID__32E0915F
dbo    OtherTable2    FK__OtherTable2__ID__35BCFE0A

And even generate a script to drop and re-create them (dropping the redundant unique constraint in the meantime):

DECLARE 
  @sql1 NVARCHAR(MAX) = N'', 
  @sql2 NVARCHAR(MAX) = N'ALTER TABLE dbo.MyTable DROP CONSTRAINT myx;', 
  @sql3 NVARCHAR(MAX) = N'';

SELECT 
  @sql1 += N'
ALTER TABLE ' + QUOTENAME(s.name) + '.' + QUOTENAME(t.name)
  + ' DROP CONSTRAINT ' + QUOTENAME(fk.name) + ';',
  @sql3 += N'
ALTER TABLE ' + QUOTENAME(s.name) + '.' + QUOTENAME(t.name)
  + ' ADD CONSTRAINT ' + QUOTENAME(fk.name) + ' FOREIGN KEY '
  + '(' + QUOTENAME(c2.name) + ') REFERENCES dbo.MyTable(MyTableID);'
FROM sys.foreign_key_columns AS fkc
INNER JOIN sys.foreign_keys AS fk
ON fkc.constraint_object_id = fk.[object_id]
INNER JOIN sys.tables AS t
ON fkc.parent_object_id = t.[object_id]
INNER JOIN sys.schemas AS s
ON t.[schema_id] = s.[schema_id]
INNER JOIN sys.columns AS c1
ON c1.[object_id] = fkc.referenced_object_id
AND c1.column_id = fkc.referenced_column_id
AND c1.name = N'MyTableID'
INNER JOIN sys.columns AS c2
ON c2.[object_id] = fkc.parent_object_id
AND c2.column_id = fkc.parent_column_id
WHERE fkc.referenced_object_id = OBJECT_ID('dbo.MyTable');

PRINT @sql1;
PRINT @sql2;
PRINT @sql3;
-- EXEC sp_executesql @sql1;
-- EXEC sp_executesql @sql2;
-- EXEC sp_executesql @sql3;

Results:

ALTER TABLE [dbo].[OtherTable1] DROP CONSTRAINT [FK__OtherTable1__ID__32E0915F];
ALTER TABLE [dbo].[OtherTable2] DROP CONSTRAINT [FK__OtherTable2__ID__35BCFE0A];

ALTER TABLE dbo.MyTable DROP CONSTRAINT myx;

ALTER TABLE [dbo].[OtherTable1] ADD CONSTRAINT [FK__OtherTable1__ID__32E0915F] 
  FOREIGN KEY ([ID]) REFERENCES dbo.MyTable(MyTableID);
ALTER TABLE [dbo].[OtherTable2] ADD CONSTRAINT [FK__OtherTable2__ID__35BCFE0A] 
  FOREIGN KEY ([ID]) REFERENCES dbo.MyTable(MyTableID);

This explicitly handles this case, where the constraint only involves a single column. It gets a little more complex if there are multiple columns involved (and this answer is not meant to solve that problem). I also didn't test if this works exactly as coded if the foreign keys point to a redundant unique index (which has the same underlying structure but is created with slightly different DDL). Exercise for the reader. :-)

SQL Server – Finding Tables Without Explicit Primary Key

Couple ways to skin this cat but this works fine in SQL Server 2005 and up, and I find it a pain free way to handle the problem -

The OBJECTPROPERTY() function can list various properties about objects - like tables. One of those properties is whether or not a table has a primary key.

OBJECTPROPERTY(object_id, tablehasprimarykey) = 0 would be a table without a primary key.

SELECT OBJECT_SCHEMA_NAME( object_id ) as SchemaName, name AS TableName
FROM sys.tables
WHERE OBJECTPROPERTY(object_id,'tablehasprimaryKey') = 0 
ORDER BY SchemaName, TableName ;

Should give you what you need. You can see all about the other ways to use the OBJECTPROPERTY() function in books online. This is the 2012 version of the article.

Best Answer

Related Solutions

SQL Server – Unable to Drop Non-PK Index Referenced in Foreign Key Constraint

SQL Server – Finding Tables Without Explicit Primary Key

Related Question