Sql-server – do partitions make sense when dates are not involved

partitioningsql server

We are having some discussions about partitioning some application tables in SQL Server 2008 based on their quickly growing size. One is an event table that is typically used for INSERTS (80% of the queries) or SELECTS that look for the primary key of the table (20% of the queries). The other table is a "mapping" table that is all SELECTS looking up the primary key.

Would partitions along the primary key be helpful here? I've done a lot of reading and all the classic examples of partitioning seem to be data warehouse tables that are partitioned on dates. It also seems that partitioning can sometimes cause more harm than good.

What do you think about these tables?

Best Answer

You did not explain why you want to partition the table and what do you expect from partitioning. You only mention table size, which is hardly a criteria for partitioning. Performance wise partitioning will make everything slower, not faster. The best you can hope for is on-par performance with the unpartitioned table. Some scenarios that do make sense with partitioning are:

  • ETL needs like load jobs that manipulate data intensively in staging tables and then switch in the entire staging table in one fast operation
  • need to remove large volumes of data that has past retention period (monthly switch out and truncate)
  • administrative reasons, like the need to rebuild individual partitions

Many quote reasons like 'move old data to slower disks' but I don't buy that argument much. And another oft quoted reason is to distribute data in multiple files, but that is incorrect thinking as a filegroup can contain multiple files across many volumes and the engine would distribute IO across them anyway w/o any partitioning need.

You mention that your developers quote a case when partitioning by data increased performance. Perhaps is a case of a time series data that was clustered by id, not by date and in which all range queries (typical for time series) had to do table scans. Partitioning appeared to help because partition elimination reduced the amount of data scanned. But a proper clustered index would had addressed the problem much better (no surprise there, indexing is the usually correct answer to query performance issues).

An extreme case when partitioning is useful is hash partitioning to help spread insert last page latch contention.

But when push come to shove the partitioning is an 'all-or-nothing' approach that has a very heavy impact (consider that you can no longer have an unique primary key that does not include the partitioning field anymore) and percolates through the data model everywhere (eg. a lot of foreign keys have to be re-designed). It requires careful administration. Query Optimizer can misfire some horrible plans in presence of partitioning.

Kendra Little has a nice article explaining pros and cons: How To Decide if You Should Use Table Partitioning