Sql-server – Partitioned Tables and Indexes – what are the downsides

indexpartitioningperformancesql serversql server 2014

When talking about Partitioned Tables and Indexes for tables with less than 100 partitions,

no nonaligned indexes:

with that I mean:

Nonaligned index

An index partitioned independently from its corresponding table.

That is, the index has a different partition scheme or is placed on a
separate filegroup from the base table.

Designing an nonaligned partitioned index can be useful in the
following cases:

The base table has not been partitioned.

The index key is unique and it does not contain the partitioning
column of the table.

You want the base table to participate in collocated joins with more
tables using different join columns

Are there other performance drawbacks than:

1 – slowing down some DBCC commands

2 – Queries that use operators such as TOP or MAX/MIN on columns other than the partitioning column may experience reduced performance with partitioning because all partitions must be evaluated.

3 –

Queries that use partition elimination could have comparable or
improved performance with larger number of partitions. Queries that do
not use partition elimination could take longer to execute as the
number of partitions increases.

Best Answer

Expanding upon your list, here are a couple potential downsides that we have come across in real production workloads:

Seeking into multiple partitions

Expanding on the queries that do not use partition elimination could take longer to execute point, there is a specific pattern that is particularly affected: singleton seeks. This operation will become much slower if all (or even a modest subset of) partitions need to be accessed. The skip scan operation essentially performs a seek into every partition that cannot be eliminated.

enter image description here

Let's say that you have a billion row table (N = 1,000,000,000) with rows divided equally into 1,000 partitions (P = 1,000). A single seek is roughly O(log(N)) ~ 30 in a non-partitioned table. However, this same seek operation becomes roughly O(P*log(N/P)) ~ 20,000 in this hypothetical partitioned table. So the seek now performs over 500x more work if data from all partitions is needed (or sometimes even if it isn't needed, but SQL can't prove that based on your query).

Note that this can come up both when you explicitly query the table for one row (or a small range of rows) and in more complex queries when the partitioned table appears in the innder side of a loop join. The good news is that SQL Server is reasonably good about taking this into account in cost-based optimization, but that still typically means that you get a hash join when a loop-seek into a non-partitioned table would have been far more optimal.

Thread skew in parallel query execution

In parallel query plans, threads are allocated to partitions. If there is one partition that is much larger than the others, queries against the table may be particularly susceptible to thread skew. It's possible that one thread gets too high a proportion of rows and is processing long after the other threads have done their work. This situation can happen with non-partitioned tables as well, but any partition functions that do not equally distribute rows are particularly vulnerable.

See Parallel Query Execution Strategy for Partitioned Objects for a more detailed description of the allocation of threads to partitions. For example:

The query processor uses a parallel execution strategy for queries that select from partitioned objects. As part of the execution strategy, the query processor determines the table partitions required for the query and the proportion of threads to allocate to each partition. In most cases, the query processor allocates an equal or almost equal number of threads to each partition, and then executes the query in parallel across the partitions.