Sql-server – Running parallel delete statements

deletesql serversql-server-2012

Problem: I have a massive table – occupies a whopping 9.5 TBs. I need to delete around 5 mil rows from it. The best method I have come up with deletes 1 million records in 2 hours! Since I can not run this during peak time, I get only 2 hour window everyday and deleting 5 mil rows would take me 4 working days at this rate.

Question: I was wondering if SQL Server has a feature where I could run 2 or more deletes on a table simultaneously. Possibly something like virtually partitioning the table – not sure.

I also want to add that running my script induces page locks but keeps the transaction log from growing. My script is very similar to the one suggested by vonPryz:

How to Delete From Large Tables in Batches Without Growing the Transaction Log by Greg M Lucas.

Best Answer

Let me try to summarise the answers as per the commenters above.

First of all, to achieve higher deletion speeds you can do three things:

  1. Delete a higher number of rows at a time
  2. Use partitioning
  3. Run multiple deletes

Ad 1) Since deleting a large number of rows at a time hit the transaction log hard and causes locking - you likely want the number of deletes in a statement to be "small". I find that around 10K rows is a good number (and having table lock escalation turned off with ALTER TABLE Foo SET LOCK_ESCALATION = Disabled

Ad 2) If your table is not already partitioned, this is not an option

Ad 3) DELETE in SQL Server is single threaded, so you need to run more than one at a time to get max speed. To run multiple deletes concurrently, you need a way to partition them so each parallel delete runs on its own set of keys and doesn't block with the others. Typically, you can use the primary key of the table to partition the delete.

For example, if you have an IDENTITY column called key on the table, you can first:

SELECT MAX(key) - MIN(key), MIN(key) FROM Foo WHERE <rows that must be deleted>

Keep the Max - Min and Min somewhere (a table in tempdb) that you can quickly read from into variables @IntervalSize, @MinKey respectively.

Let us say you decide that you can allocate 4 cores for running deletion. You now run 4 queries, either from 4 x SSMS new query or four command prompt via SQLCMD, each doing this:

DECLARE @NumDone INT = 1
WHILE @NumDone > 0 BEGIN
  SET ROWCOUNT 10000

  DELETE FROM Foo WHERE Key BETWEEN @MinKey + @IntervalSize / 4 * @n 
     AND @MinKey + @IntervalSize / 4 (@n + 1)
  SET @NumDone = @@ROWCOUNT
END

Pick @n = 0,1,2,3 for each of the concurrently running DELETES.