Sql-server – Primary keys, clustered indexes and partitioning

database-designenterprise-editionpartitioningsql serversql-server-2012

We store data for financial records, splitting it up over 4 tables. I'm at a bit of a loss as to what's optimal for our setup here, or at least which direction to head in as I'm getting a lot of conflicting advice and resistance, and don't have all the facts to decide on ways forward. Below is essentially what all 4 tables look like.

CREATE TABLE [Example].[MasterRecordTable](
    [rowDateTime] [datetime] NOT NULL CONSTRAINT [rowDateTime] DEFAULT (GETDATE()),
    [recordID] [int] IDENTITY(1,1) NOT NULL,
    [date] [date] NOT NULL,
    [field1] ...,
    ...
 CONSTRAINT [PK_MRT] PRIMARY KEY CLUSTERED 
(
    [recordID] ASC,
    [date] ASC
)

CREATE TABLE [Example].[ChildTables](
    [recordID] [int] IDENTITY(1,1) NOT NULL,
    [date] [date] NOT NULL,
    [field1] ...,
    ...
 CONSTRAINT [PK_CT] PRIMARY KEY CLUSTERED 
(
    [recordID] ASC,
    [date] ASC
)
  • All tables use a clustered primary key that covers recordID and date, even though at no point while processing do we JOIN on or include date in WHERE clauses, and it's only recordID that needs to be a primary key. To me date should not be present in the primary key or clustered index.
  • All tables carry the date field, is this necessary if you consider my question about archiving/partioning below? The only reason it's in all tables is to assist with our manual archiving process.
  • We do not make use of foreign keys, and I'd like to know if this is worth reconsidering
  • We need an archiving strategy. At the moment we have tables with the same structure as above, with _Archive added to the name and placed in a separate file-group and hard drive. We then manually move records WHERE date <= @aYearAgo for each table over to it's _Archive equivalent, daily. This, as well as developing queries that bridge both tables, is tedious and time-consuming. We're busy evaluating partitioning, and I'd like to have some well-informed answers as to what setup is ideal considering the table structure above. It makes sense to partition and archive based on the date value. We'd like to be able to move older data over to progressively slower hard-drives, then deleted after 5 years.
  • Would it not be more optimal to have a clustered primary key on recordID alone, does the partitioning field (date) have to be apart of the clustered index?
  • The only time we need date is when we report on these tables. Depending on your answers above, I imagine this can be done best with an NC index on the Master table, which is then joined to the Child tables via recordID.

Please let me know if anything is unclear. This is a dump of the concerns off the top of my head, I'm keen to work with someone knowledgeable on indexes, key relationships, and partitioning. Thanks!

Best Answer

at no point while processing do we JOIN on or include date in WHERE clauses

All partitions will need to be touched when the partitioning column is not specified in query predicates. Put yourself in SQL Server's shoes - how would you know which partitions to access (or not) without knowing the partitioning column value? Consider that a singleton index seek without partition elimination will seek against every partition. Compared to a non-partitioned table, this seek overhead will be more noticeable when few rows are returned. There won't be as much difference with full scans, though.

Would it not be more optimal to have a clustered primary key on recordID alone, does the partitioning field (date) have to be apart of the clustered index?

Yes, recordID alone would be more optimal but the partitioning column must be part of the clustered index key as well as all unique indexes. The implication is that primary key and unique constraints must include the partitioning column as part of the key.

it's only recordID that needs to be a primary key. To me date should not be present in the primary key or clustered index.

Without knowing your data I can't say if date should be part of the primary key or not. Assuming not from a data model perspective, you cannot partition the primary key index without it. One approach is to use a non-clustered non-partitioned primary key, considering the performance implications on queries.

Tiered storage for less frequently accessed data is a use case for table partitioning with different file groups. You will still need to develop a method to either physical move the underlying files or physically move rows from one filegroup to another. Partition SWITCH alone cannot do the job.