We store data for financial records, splitting it up over 4 tables. I'm at a bit of a loss as to what's optimal for our setup here, or at least which direction to head in as I'm getting a lot of conflicting advice and resistance, and don't have all the facts to decide on ways forward. Below is essentially what all 4 tables look like.
CREATE TABLE [Example].[MasterRecordTable](
[rowDateTime] [datetime] NOT NULL CONSTRAINT [rowDateTime] DEFAULT (GETDATE()),
[recordID] [int] IDENTITY(1,1) NOT NULL,
[date] [date] NOT NULL,
[field1] ...,
...
CONSTRAINT [PK_MRT] PRIMARY KEY CLUSTERED
(
[recordID] ASC,
[date] ASC
)
CREATE TABLE [Example].[ChildTables](
[recordID] [int] IDENTITY(1,1) NOT NULL,
[date] [date] NOT NULL,
[field1] ...,
...
CONSTRAINT [PK_CT] PRIMARY KEY CLUSTERED
(
[recordID] ASC,
[date] ASC
)
- All tables use a clustered primary key that covers
recordID
anddate
, even though at no point while processing do we JOIN on or includedate
in WHERE clauses, and it's onlyrecordID
that needs to be a primary key. To medate
should not be present in the primary key or clustered index. - All tables carry the
date
field, is this necessary if you consider my question about archiving/partioning below? The only reason it's in all tables is to assist with our manual archiving process. - We do not make use of foreign keys, and I'd like to know if this is worth reconsidering
- We need an archiving strategy. At the moment we have tables with the same structure as above, with
_Archive
added to the name and placed in a separate file-group and hard drive. We then manually move recordsWHERE date <= @aYearAgo
for each table over to it's_Archive
equivalent, daily. This, as well as developing queries that bridge both tables, is tedious and time-consuming. We're busy evaluating partitioning, and I'd like to have some well-informed answers as to what setup is ideal considering the table structure above. It makes sense to partition and archive based on thedate
value. We'd like to be able to move older data over to progressively slower hard-drives, then deleted after 5 years. - Would it not be more optimal to have a clustered primary key on
recordID
alone, does the partitioning field (date
) have to be apart of the clustered index? - The only time we need
date
is when we report on these tables. Depending on your answers above, I imagine this can be done best with an NC index on the Master table, which is then joined to the Child tables via recordID.
Please let me know if anything is unclear. This is a dump of the concerns off the top of my head, I'm keen to work with someone knowledgeable on indexes, key relationships, and partitioning. Thanks!
Best Answer
All partitions will need to be touched when the partitioning column is not specified in query predicates. Put yourself in SQL Server's shoes - how would you know which partitions to access (or not) without knowing the partitioning column value? Consider that a singleton index seek without partition elimination will seek against every partition. Compared to a non-partitioned table, this seek overhead will be more noticeable when few rows are returned. There won't be as much difference with full scans, though.
Yes, recordID alone would be more optimal but the partitioning column must be part of the clustered index key as well as all unique indexes. The implication is that primary key and unique constraints must include the partitioning column as part of the key.
Without knowing your data I can't say if date should be part of the primary key or not. Assuming not from a data model perspective, you cannot partition the primary key index without it. One approach is to use a non-clustered non-partitioned primary key, considering the performance implications on queries.
Tiered storage for less frequently accessed data is a use case for table partitioning with different file groups. You will still need to develop a method to either physical move the underlying files or physically move rows from one filegroup to another. Partition
SWITCH
alone cannot do the job.