Sql-server – SQL Server Database Table Partitioning Consideration

best practicespartitioningsql servertable

I have a SQL Server table with 42 million + records. This table acts as a staging area for transaction records loaded from a different system. Data from this staging table have to be transformed and moved into a data warehouse database, which is done using stored procedure and this involves complex join on other staging tables of similar size.

Below is the structure of staging table:

  • Id
  • SystemCode
  • TransactionDate
  • ProcessStatus
  • DataField1
  • DataField2
  • DataFieldN

I am thinking about partitioning the staging tables on the SystemCode field. I have around 40 distinct systems from which data will be loaded. Once the records are processed, 'ProcessStatus' flag will be updated to prevent further processing.

Does anyone think this is a bad idea? My aim is to have them loaded as quickly as possible into data warehouse. I have considered indexes and the result was not great. I can't partition on date as the load to warehouse won't be based on this criteria.

Appreciate your help. Thanks.

Best Answer

The real use case for table partitioning is fast load and unload of data.

If your warehouse tables and staging tables are in the same database, and have the same schema, you will be able to swap a partition from staging to warehouse very quickly. Similarly, when data has reached its retention expiry date and must be purged, it is very quick to delete a whole partition of data. This page and its ilk will give some pointers.

As each partition can be directed to a different filegroup your backup & recovery cycles could be shortened at the cost of increased complexity. For example, load one company's data, process it, then take a backup of the file(s) that hold that company. Perhaps several companies could be processed in parallel, knowing that file contention will be reduced through partitioning?

Indexes can be partitioned as well as the tables. Index maintenance can be performed one partition at a time. This could reduce contention and overall load on your system.

Using partitions solely for performance enhancement can be problematic. Every query will have to have the partition key in the predicate, as a minimum. Sometimes the optimiser chooses not to perform partition elimination even then.

Partitioning is no free lunch. It has costs as well as benefits. You must consider both sides and test well. As so often in DB design "it depends."