Sql-server – what could be the fastest way to delete all data from a partition of a partitioned table

deleteoptimizationpartitioningsql serversql server 2014

I have a partitioned table that in real life has 80 million rows.

for testing purposes I have created and partitioned this table here.

when I run the following query:

select * from countries
where visit >= '20110101'
  and visit <= '20111231'

as you can see in the query plan here and on the picture below, it uses partition elimination so I know I am doing something right.

I am aware that partition generally is not meant to speed up my queries, it is a management feature, however, it can speed up queries on large tables.

I will start by stating what I don't want.
I don't want to remove any partition from my table.

what I want?

I want to delete all data from a partition in the quickest possible way:

can something be quicker than this?
without considering deleting in batches

BEGIN TRANSACTION T1

DELETE 
FROM dbo.countries WITH (TABLOCKX)
WHERE visit >= '20110101'
  AND visit <= '20111231'

--COMMIT TRANSACTION T1

query plan is here

Best Answer

SQL Server 2014 unfortunately doesn't support TRUNCATE on a partition. Either drop and recreate it or switch it out.

See longer discussion here.

SQL Server 2016 does support truncating partitions. If you're on that version, that's definitely your fastest option.

Related Solutions

Sql-server – Partitioned Tables and Indexes – what are the downsides

Expanding upon your list, here are a couple potential downsides that we have come across in real production workloads:

Seeking into multiple partitions

Expanding on the queries that do not use partition elimination could take longer to execute point, there is a specific pattern that is particularly affected: singleton seeks. This operation will become much slower if all (or even a modest subset of) partitions need to be accessed. The skip scan operation essentially performs a seek into every partition that cannot be eliminated.

Let's say that you have a billion row table (N = 1,000,000,000) with rows divided equally into 1,000 partitions (P = 1,000). A single seek is roughly O(log(N)) ~ 30 in a non-partitioned table. However, this same seek operation becomes roughly O(P*log(N/P)) ~ 20,000 in this hypothetical partitioned table. So the seek now performs over 500x more work if data from all partitions is needed (or sometimes even if it isn't needed, but SQL can't prove that based on your query).

Note that this can come up both when you explicitly query the table for one row (or a small range of rows) and in more complex queries when the partitioned table appears in the innder side of a loop join. The good news is that SQL Server is reasonably good about taking this into account in cost-based optimization, but that still typically means that you get a hash join when a loop-seek into a non-partitioned table would have been far more optimal.

Thread skew in parallel query execution

In parallel query plans, threads are allocated to partitions. If there is one partition that is much larger than the others, queries against the table may be particularly susceptible to thread skew. It's possible that one thread gets too high a proportion of rows and is processing long after the other threads have done their work. This situation can happen with non-partitioned tables as well, but any partition functions that do not equally distribute rows are particularly vulnerable.

See Parallel Query Execution Strategy for Partitioned Objects for a more detailed description of the allocation of threads to partitions. For example:

The query processor uses a parallel execution strategy for queries that select from partitioned objects. As part of the execution strategy, the query processor determines the table partitions required for the query and the proportion of threads to allocate to each partition. In most cases, the query processor allocates an equal or almost equal number of threads to each partition, and then executes the query in parallel across the partitions.

Sql-server – Is it possible to force the optimizer to eliminate irrelevant tables in this partitioned view

Enable trace flag 4199.

I also had to issue:

UPDATE STATISTICS dbo.ObservationDates 
WITH ROWCOUNT = 73049;

to get the plans shown below. Statistics for this table were missing from the upload. The 73,049 figure came from the Table Cardinality information in the Plan Explorer attachment. I used SQL Server 2014 SP1 CU4 (build 12.0.4436) with two logical processors, maximum memory set to 2048 MB, and no trace flags aside from 4199.

You should then get an execution plan that features dynamic partition elimination:

select 
    od.[Year], 
    AvgValue = avg(ObservationValue)
from dbo.v_Observation o 
join dbo.ObservationDates od
    on o.ObservationDateKey = od.DateKey
where 
    od.[Year] >= 2000 and od.[Year] < 2006
group by 
    od.[Year]
option (querytraceon 4199);

Plan fragment:

This might look worse, but the Filters are all start-up filters. An example predicate is:

Per iteration of the loop, the start-up predicate is tested, and only if it returns true is the Clustered Index Seek below it executed. Hence, dynamic partition elimination.

This is perhaps not quite as efficient as static elimination, especially if the plan is parallel.

You may need to try hints like MAXDOP 1, FAST 1 or FORCESEEK on the view to get the same plan. Optimizer costing choices with partitioned views (like partitioned tables) can be tricky.

The point is you need a plan that features start-up filters to get dynamic partition elimination with partitioned views.

Queries with embedded USE PLAN hints: (via gist.github.com):

Best Answer

Related Solutions

Sql-server – Partitioned Tables and Indexes – what are the downsides

Sql-server – Is it possible to force the optimizer to eliminate irrelevant tables in this partitioned view

Related Question