Sql-server – Deleting Rows is Increasing Index Size

archivedeleteindexsql-server-2008-r2ssis

We have a very large table in a SQL Server 2008 R2 Standard server which I am archiving rows by copying them into another database on a separate disk and then deleting them from the original table using an SSIS data flow. The table has a bigint Primary Key and rows are being deleted in numerical order of this key. As I am deleting rows, however, the overall Index size of the table is steadily increasing, and I cannot figure out why. The data size is remaining the same throughout this process.

Here are the table details:

Row Count: ~300,000,000
Data Size: ~65 GB
Index Size: ~65 GB

Rows are being deleted out at a rate of ~350,000 per hour.

The column definitions use the following data types: smallint, int, bigint, char, varchar, nvarchar, uniqueidentifier, bit, datetime

The table has one Primary Key which is the Clustered key on the table as well as 4 non-clustered indexes.

The table is also part of a replication maintenance plan. The replicated copy of the table is decreasing in both Data Size and Index Size!

I performed the same process on another table last week and I could see both the Data Size and Index Size of that table decreasing as I deleted rows.

Is there any explanation as to why the Index Size keeps going up in this case?

Best Answer

Alright so I believe you doing and INSERT is increasing the size then after you do your OLEDB command which is sending hundreds of thousands of DELETE commands to SQL Server is what's causing your problem. So here is a short Demo/Tutorial on how to do this DELETE operation more effectively. Even if this doesn't solve your index problem it will make your package run more efficiently.

STEP 1: Build a staging table for this information.

You're going to want to CREATE a table that's sole purpose is to hold the columns necessary for the delete. Whatever makes up your Primary Key will do.

STEP 2: Truncate the Staging table.

Create Execute SQL Task

It's important to truncate first as this will all for re-run if the package fails.

Edit SQL Task

Step 3: Data Flow task.

Next add a Data Flow Task and open.

Add DFT

If you are doing your archive elsewhere than disregard the Multicast and the Archive destination in the next image.

DFT Instructions

STEP 3: Delete

Delete

Delete Statement

That should give you a pretty good idea of how to do this if you have any other question feel free to ask.

The reason this will be more efficient is because you are then doing set based operations instead of sending 350K DELETE commands to SQL server.