Sql-server – Best way to evenly distribute data over partitioned tables

sql serversql-server-2008

I am working with a database with roughly around 500MM records. We currently use a static partitioning method that partitions the data by an int identifier such at < 150MM goes on partition 1, > 150 && < 350 goes on 2 etc.

What dynamic partitioning schemes are available to better evenly distribute the data over the available partitions? Is there a way to allow for additional partitions to a dynamic scheme with minimal maintenance overhead?

Best Answer

Hashing is a common partitioning mechanism for your use case.. It is NOT available in SQL 2008: http://www.sqlservercentral.com/articles/partition/64740/.

The problem with hash partitioning is that if you add partitions, you need to redistribute the data. Logically speaking, any partitioning scheme that dynamically determines which partition the data goes into the data would need to redistribute the data after you add partitions (other than perhaps using a hash on the insert time + keeping track of when each partition was added). This assumes partitioning (distributing data via a predefined scheme to reduce subsequent lookup time) vs sharding (randomly distributing data to multiple boxes reduce processing time of the entire data set).

You can probably simulate a hash partition by generating the hash yourself on insert/update, and using a range based partition over the hashed value. For instance, lets say you want 10 partitions. Modulus divide a hash of your key value by 10, and range partition on the remainder (with each range taking one of the 10 possible values).