Sql-server – Deleting rows from big table: what then

sql serversql-server-2016

I will be deleting 40% of rows from a table that currently has 2.5 millions rows, space reserved 17 GB, 150 MB index, unused space 3 MB.

The primary key is the identity, and records were inserted in order. There are 2 non-clustered indexes. The rows I delete are the oldest. There is a FK pointing to the table, but it is not a problem because it only refers to recent records (which will not be deleted). Columns are mostly nvarchar, including a few nvarchar(max).

The actual deletion will take less than 10 minutes, the database server will not be used by anyone else for a couple of hours, so I am not worried by the deletion itself.

To get the maximum advantage from the point of view of disk size and performances, which operations can I do after the deletion? I don't want to create a new table and move the data, but is there a way I can rearrange the data and reclaim the space?

Best Answer

Rebuild the table

ALTER TABLE yourtablename REBUILD

Related Solutions

Freeing Unused Space in SQL Server Table – Best Practices

The only thing we have done is replaced ScanImage on every row with a much smaller image (this is how so much unused space is there)

From doing some experimentation the most space effective method would be to drop the allocation unit and repopulate it (if you have a maintenance window to do this in).

Example code that achieved the best space reduction for me with the table structure in the question is:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

SET XACT_ABORT ON;

BEGIN TRAN

SELECT [ImageID],
       [ScanImage]
INTO   #Temp
FROM   [dbo].[MyTableName]

ALTER TABLE [dbo].[MyTableName]
  DROP COLUMN [ScanImage]

/*Allocation unit not removed until after this*/
ALTER INDEX PK_Image ON MyTableName REBUILD

ALTER TABLE [dbo].[MyTableName]
  ADD [ScanImage] IMAGE NULL

UPDATE [dbo].[MyTableName]
SET    [ScanImage] = T.[ScanImage]
FROM   [dbo].[MyTableName] M
       JOIN #Temp T
         ON M.ImageID = T.[ImageID]

DROP TABLE #Temp

COMMIT

Everything is in a transaction so if the machine crashes it will be rolled back. Could probably do with some error handling or at least SET XACT_ABORT ON. I used SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; to prevent any concurrent modifications from happening during or after the copy and being lost.

The number of LOB pages reserved after reducing the size of an image in all rows was as follows:

+--------------------------------------------------+---------------------+-------------------------+
|                      Event                       | lob_used_page_count | lob_reserved_page_count |
+--------------------------------------------------+---------------------+-------------------------+
| Inserted 10,000 rows with 100,000 byte data each |              135005 |                  135017 |
| Updated all rows to 10,000 byte image data       |               31251 |                  135012 |
| Reorganize                                       |               23687 |                   25629 |
| Drop and re-add image data                       |               13485 |                   13489 |
+--------------------------------------------------+---------------------+-------------------------+

SQL Server 2005 – Deleting Rows with Limited Expansion Space

As things stand, the query:

DELETE TOP (4000)
FROM   [OLD_TABLE] WITH (TABLOCKX)
WHERE  [Date] < '2014-01-01 00:00:00'

...has to scan the whole heap table, testing each row it finds, until it eventually finds 4000 to delete. On the next iteration, the whole business starts again from square one. Assuming the scanning process is performed in the same (allocation unit) order each time, these scans will take longer and longer to find 4000 rows as time goes by.

Creating an index on the [Date] column will allow SQL Server to find the 4000 rows much more efficiently. It does add a small overhead to each delete (as the new nonclustered index needs to be maintained as well) but this is nothing compared to the effort that will be saved by not performing a scan each time. The index will also require a certain amount of space, but it should not be too large. You should create the following index before resuming your data removal process:

CREATE NONCLUSTERED INDEX <index_name>
ON dbo.OLD_TABLE ([Date]);

By the way, if the database has snapshot isolation or read committed snapshot isolation enabled (even if not actively used!), the TABLOCKX hint will not be enough to ensure that empty heap pages are deallocated. Your heap table may therefore contain many empty pages - a concern since you are so low on space.

The standard way to address this space management issue is to create a clustered index (or issue an ALTER TABLE REBUILD statement, but that requires SQL Server 2008). It seems that creating a clustered index might also not be an option for you, due to space constraints. There isn't an obvious way to resolve this right now, given the space issue.

One thing to keep an eye on while the delete is progressing is the space used by the transaction log. If the database is in FULL or BULK_LOGGED recovery, you will need to keep on top of backing up the log. If the database is using SIMPLE recovery, you may need to issue a manual CHECKPOINT from time to time to release transaction log space for reuse (otherwise the physical file might grow).

If you can get hold of some additional temporary storage, a better way might be to bulk export the data you want to keep to a safe location, drop the table, recreate it (preferably with a clustered index!) and reload the saved data. This is generally faster than the incremental-delete process, but it depends on your objectives and priorities.

Best Answer

Related Solutions

Freeing Unused Space in SQL Server Table – Best Practices

SQL Server 2005 – Deleting Rows with Limited Expansion Space

Related Question