Sql-server – Replacing composite key with surrogate

database-designfragmentationprimary-keysql serversurrogate-key

I have the following table that has a couple of million rows in it and is 99% fragmented virtually all of the time. My plan was to insert a IDENTITY field as a surrogate key to replace the current composite 6 field primary, then make the current key a unique key for referential integrity and recreate the indexes.

    CREATE TABLE [dbo].[Autocompleter](
        [CountryId] [int] NOT NULL,
        [ProvinceId] [int] NOT NULL,
        [LocationId] [int] NOT NULL,
        [PlaceId] [int] NOT NULL,
        [EstabId] [int] NOT NULL,
        [LocaleId] [int] NOT NULL,
        [Title] [varchar](400) NOT NULL,
        [Hotels] [int] NULL,
        [AlternateTitles] [varchar](4000) NULL,
        [EnableHotels] [bit] NOT NULL,
        [EnableHolidays] [bit] NOT NULL,
        [DisplayPriority] [int] NOT NULL,
     CONSTRAINT [PK_autocompleter_1] PRIMARY KEY CLUSTERED 
    (
        [CountryId] ASC,
        [ProvinceId] ASC,
        [LocationId] ASC,
        [PlaceId] ASC,
        [EstabId] ASC,
        [LocaleId] ASC
    )

Any gotchas I should be looking out for ? if it is an identity field I am thinking this should not break code that inserts into the table (as long as it specifies the columns explicitly)

I plan to create a new clustered index on the surrogate key and then make the current clustered index on the 6 fields a NC index.

Best Answer

Reading up on Kimball dimensional modeling, use of a surrogate key, in particular an IDENTITY, will help reduce fragmentation caused by page splits as you'll be appending rows at the end of leaf pages vs. attempting to insert them in the middle if the keys are not in the order of the index (ascending or descending depending on how the index was defined).

But, if the surrogate key will not be used in joins with other tables or in WHERE criteria, use of the surrogate key as a clustered index might not provide any additional benefit after the initial data import. If you keep the composite clustered key, sorting the source data in the order of the six-column composite key prior to import should avoid the page split fragmentation you're witnessing.

As for a choice of clustered index, the best candidate may be the column(s) that would be used in your most common/critical queries for WHERE criteria or joins. Use of a NCI will require a bookmark lookup to obtain the remaining values in the table if needed in the resultset.