Sql-server – Managing and Removing data from huge table

deadlockdeletesql servertable

I have the following scenario:

A huge table, that is also the most use table by far in the system.
By huge I mean it is huge in number of rows and also huge in actual disk size, because it has 2 byte array columns that saves files from our system.

The problem is when I need to remove a large number of rows from it.
First I tried just a single delete with in statement, but is froze the whole table (inutilizing the system with it), for about 45 minutes, when I decide to restart the database.

Then I tried deleting by chunks of 10 registries, and tried deleting 8k+ rows. It went really slow, and everywhere that got data from that table was slow too, and when it reached about 700 rows deleted, I started another process that used that same table, and got a deadlock.

How should I make these mass deletions from this table?
I thought about splitting the table in 2, one for the 2 byte arrays that contains the most part of table volume, and other with the rest of the informations (that are simple things like varchars, integers, datetimes and so on). Would doing this benefit me on these cases of deletion? Or the number of rows would still cause these locks on the table?

The version of SQL is Microsoft SQL Server Standard (64-bit) 12.0.5000.0.

Best Answer

First off, a couple of articles:

You didn't mention the version of SQL Server, so I'm not sure if the second one is relevant to you, but it might be to other readers. I'll focus on the former.

Typically, deleting a lot of data causes a wide array of symptoms:

  1. more rows affected = greater likelihood of lock escalation
  2. longer transactions = lengthy blocking (and sometimes deadlocks, as you've found)
  3. heavy writes to transaction log = write latency (including growth events)

So my idea in that article was to lock fewer rows by deleting in "chunks" - each delete operation (or a finite number of them) would be in a transaction, thereby reducing the number of locked rows, the length of the transaction, and the impact to the log.

So I might suggest trying something like this:

SELECT key_column
  INTO #work
  FROM dbo.big_table
  WHERE -- however you identify rows to delete;

CREATE CLUSTERED INDEX x ON #work(key_column);

DECLARE @rc  int = 1, 
  @counter   int = 1,
  @batchsize int = 100; -- this may or may not be an optimal chunk size

WHILE @rc > 0
BEGIN
  BEGIN TRANSACTION;

  DELETE TOP (@batchsize) t
    FROM dbo.big_table AS t
    INNER JOIN #work AS x
    ON t.key_column = x.key_column;

  SET @rc = @@ROWCOUNT;

  COMMIT TRANSACTION;

  SET @counter = @counter + 1;

  IF @counter % 10 -- or maybe 100 or 1000
  BEGIN
    CHECKPOINT; CHECKPOINT;
  END
END

The checkpoints make sure that the log space from the transaction can be re-used (the second one ensures the log properly "wraps around"). In your full recovery environment, you'd just need a single BACKUP LOG command there... or you might consider switching to simple if you can time this maintenance work very closely to your regularly scheduled full backup where you can restart the log chain.

The trick is to find the sweet spot of balancing the reuse of the log and the work the checkpoint or log backup has to do. (In my article the frequency of checkpoints actually caused the overall operation to be slower, which might be ok, it doesn't really matter how long this takes if it truly acts like a background process - the problem isn't duration, it's interference. I just didn't fish around for the sweet spot.)

The batch size effectively limits the number of rows locked within any given transaction and reduces the amount of work required by that transaction. You can use delayed durability here if you have a modern enough version; the biggest risk is that you lose a transaction, but since what you "lose" is data removal, you haven't really lost anything.