Should Archive Tables Have Their Own Surrogate Identity Key in SQL Server?

archiveperformanceperformance-tuningsql serversql-server-2012

Should archive tables have their own surrogate identity id/key? For a customer sales table example:

  • We have original table, and then an additional archive table with surrogate archive id.
  • Rules for primary clustered indexes are:
    • Ever Increasing
    • Narrow
    • Unique
    • Static

Reference: Effective Clustered Indexes (Red-Gate Hub)

At the last company I worked for, there was a thought, that if we accidentally missed a day or period of importing, e.g. imported data for May 3, accidentally skipped May 4 (issue with system), and imported May 5, that without the new archive id, we would have to reinsert data between the pages, causing fragmentation, and slower inserts.

With the archive id, we can just add on in an ever-increasing manner.

Just inquiring if adding archiveid column is a standard industry practice.

Original Table:

create table dbo.CustomerSalesId
(
    CustomerSalesId bigint primary key identity(1,1),
    CustomerId bigint not null,
    PurchaseDate datetime not null,
    Amount decimal (10,2) not null,
    .........
}

Archive Table:

create table dbo.ArchiveCustomerSalesId
(
    ArchiveCustomerSalesId bigint primary key identity(1,1),
    CustomerSalesId bigint,
    CustomerId bigint not null,
    PurchaseDate datetime not null,
    Amount decimal (10,2) not null,
    .........
}
create unique index ux_CustomerSalesId on ArchiveCustomerSalesId(CustomerSalesId)

Best Answer

Fragmentation is not as bad as you think it is. Page fullness (internal fragmentation) is much more important than the physical order of the pages (external fragementation).

Having rows inserted out of order might affect the performance of scans when the pages are read from disk. When reading pages from memory, it is not important whether the pages are in the correct order or not. Also, you are probably not working with a physical server and your disk is likely a LUN sitting on a SAN, so the advantage of page contiguity is negligible.

A good visual description of internal vs. external fragmentation can be found here: http://www.sqlservercentral.com/blogs/practicalsqldba/2012/04/05/sql-server-index-fragmentation-understanding-fragmentation/

A great article on why fragementation is not what you think can be found on Brent Ozar's blog: https://www.brentozar.com/archive/2012/08/sql-server-index-fragmentation/

That said, using an additional surrogate key just for avoiding fragmentation is not a good idea. A good clustered index is, first of all, the key that you use the most to locate the data, so that you don't have to perform additional lookups. If you alter the clustering key when you archive the data, to locate the rows you would have to use a unique nonclustered index on the column(s) of the previous clustering key. A nonclusterd index is a copy of the data that requires additional space and also requires lookups to extract columns that are not part of the index.

Wrapping up: no, you don't need an additional "archiveId" surrogate key.