Sql-server – Deleting rows from a table with a very small amount of expansion space

deletesql serversql-server-2005transaction-log

I have multiple tables that I have backed up the data in, and now am deleting the old data for the time, archiving is not an option due to space constraints at this time. I have tried running the following:

WHILE 1 = 1 
BEGIN 
    DELETE TOP ( 4000 )
    FROM    [OLD_TABLE] WITH (TABLOCKX)
    WHERE   [Date] < '2014-01-01 00:00:00'
    IF @@ROWCOUNT = 0 
        BREAK    
END

this seems to be working, but VERY SLOWLY… like 17+ hours, and only ~140,000 rows deleted.

The kicker here is this table and the others that need to be reduced are very large, around 50-60 million rows, and ~16-17 GB of space. There is only 20 GB of space left on the physical disk. I have looked at putting the data that I want to keep in an intermediate table, truncating and repopulating, but I am afraid that we will run out of space on the disk while these operations are completing. This is a production database. I have inherited this mess and I am trying to clean this up and am trying to get this DB cleaned up so we can stay functional until we can get new hardware in.

Are there any methods that I (a relative newcomer to database management) have not come across that may allow me to do this in a faster, more efficient manner?

*The table and column names have been anonymized.

EDIT

Query Execution Plan

Best Answer

As things stand, the query:

DELETE TOP (4000)
FROM   [OLD_TABLE] WITH (TABLOCKX)
WHERE  [Date] < '2014-01-01 00:00:00'

...has to scan the whole heap table, testing each row it finds, until it eventually finds 4000 to delete. On the next iteration, the whole business starts again from square one. Assuming the scanning process is performed in the same (allocation unit) order each time, these scans will take longer and longer to find 4000 rows as time goes by.

Creating an index on the [Date] column will allow SQL Server to find the 4000 rows much more efficiently. It does add a small overhead to each delete (as the new nonclustered index needs to be maintained as well) but this is nothing compared to the effort that will be saved by not performing a scan each time. The index will also require a certain amount of space, but it should not be too large. You should create the following index before resuming your data removal process:

CREATE NONCLUSTERED INDEX <index_name>
ON dbo.OLD_TABLE ([Date]);

By the way, if the database has snapshot isolation or read committed snapshot isolation enabled (even if not actively used!), the TABLOCKX hint will not be enough to ensure that empty heap pages are deallocated. Your heap table may therefore contain many empty pages - a concern since you are so low on space.

The standard way to address this space management issue is to create a clustered index (or issue an ALTER TABLE REBUILD statement, but that requires SQL Server 2008). It seems that creating a clustered index might also not be an option for you, due to space constraints. There isn't an obvious way to resolve this right now, given the space issue.

One thing to keep an eye on while the delete is progressing is the space used by the transaction log. If the database is in FULL or BULK_LOGGED recovery, you will need to keep on top of backing up the log. If the database is using SIMPLE recovery, you may need to issue a manual CHECKPOINT from time to time to release transaction log space for reuse (otherwise the physical file might grow).

If you can get hold of some additional temporary storage, a better way might be to bulk export the data you want to keep to a safe location, drop the table, recreate it (preferably with a clustered index!) and reload the saved data. This is generally faster than the incremental-delete process, but it depends on your objectives and priorities.

Related Solutions

Sql-server – How to partition a very large table with limited db space

You'll need to create a new table with the same schema, but as a partitioned object. Optionally you can compress the table to save even more space. As you are only putting on average one row per page I'm not sure how much space savings you will see. I'd recommend putting a few thousand rows into the new table then compressing to see if the space savings is worth the CPU overhead.

As for how to move this much data without eating up all your drive space and without bloating your transaction log, that'll need to be done in a loop moving small amounts of data per run. You'll want to do some data analysis to see how large of a window you can process, but I'll assume that based on the data volumes you need to move the rows one minute at a time.

DECLARE @processFrom as datetime
SELECT @processFrom = min(YourDateColumn)
FROM YourTable

DECLARE TABLE @Rows (...)

WHILE EXISTS (SELECT * FROM YourTable)
BEGIN
     DELETE TOP (10000) FROM YourTable
     OUTPUT DELETED.* INTO @Rows
     WHERE YourDateColumn = @processFrom


     INSERT INTO NewTable
     (...)
     SELECT ...
     FROM @Rows

     DELETE FROM @Rows

     IF @@ROWCOUNT = 0
              SET @processFrom = dateadd(dd, 1, @processFrom)
END

When everything is done and you've verified that all the data is in the new table drop the old table and rename the new table so that it has the old tables name. This way nothing breaks. You'll want to script out the permissions on the old table (if there are any) so that you can apply them to the new table.

If there are tables with foreign keys to this table you'll need to drop them before this will work.

Sql-server – DBCC loginfo contains thousands of VLF entries with status 2

if this is happening only for one db, check the default initial size of the log file for that DB. I suspect the minimum size has to be large enough so that after restoring or even after backing up the log SQL server increases the log size to specified minimum size and initialize all VLF with zero.

Alternately, if you have already taken a full backup of the db try to create dummy DDL stmt, backup the DB again, restore it on another machine, and confirm the DDL operation you did. now you can detach and attach the DB w/o using the existing log file that will create new log file with the default size you have in your instance.

Best Answer

Related Solutions

Sql-server – How to partition a very large table with limited db space

Sql-server – DBCC loginfo contains thousands of VLF entries with status 2

Related Question