Sql-server – Large delete query seems to have frozen

deletesql serversql-server-2008-r2

We ran a delete query on a database with 1.8bn rows. This delete would delete 1.2bn rows.

In hindsight, we would have broken up this query into 100m at a time but we are in a position where it has been running for 24 hours and the log file is at 2Tb which appears to be the maximum size allowed for a log file.

The database is in SIMPLE recovery mode.

Is there any saving this query? Or do we need to just restart the SQL Server and see what happens? Will the database be unusable? Is there anything we can do to kill this off as cleanly as possible?

Best Answer

First of all, check the SQL errorlog to see if it actually hit a max size for the log. If it did, then the query has no hope of completing, it is probably already in a rollback state.

Even if it is, I always prefer to kill the spid manually (use sp_who2 or sp_WhoIsActive to find the spid, then do a kill 59 or whatever). You also can't check the rollback status unless you do an explicit KILL, see this related thread.

Since this is a delete, and not an update or insert, you may be very lucky and find that it rolls back immediately. If not, it may take as long (or longer) to roll back as it did to get to this point.

To see the rollback status, use

kill 59 with statusonly

Unfortunately, I have found this frequently does not show anything useful, just a "0% complete". In that case, you'll have to use sp_who2 and watch the IO and CPU to see if it is still doing something.

Regarding the reboot, this is a grave risk. If the spid is actively rolling back (CPU and IO are changing), then restarting SQL will only take the database offline entirely until the rollback is completely finished (hours and hours). But, if the CPU and IO are not moving, then it may in fact clear it right away. Either way, it is a risk.

One final option, if things are especially dire: If you have a backup from just before the delete started (and there haven't been other updates to the db), then the fastest way to recover may be to simply drop the DB, restart SQL, and restore from backup.

If you can't drop the DB (or if you already restarted the instance and the sql errorlog is predicting a 24-hour recovery time), then shut down SQL services, delete the MDF and LDF files from disk, start up SQL, drop the (ghost) database, and restore from backup.

Obviously you'd only attempt that if this were a back-end processing database that users didn't interact with.

Related Solutions

Sql-server – Dealing with large log files

If the database is truly static in nature, putting it into read-only mode would prevent log growth. The queries that are causing the growth would fail however, so this might not actually be what you are looking for.

Put the database into simple recovery mode. That way the transaction logs will be routinely truncated. When in full mode, your logs will continue to grow until they are backed up.

USE [master]
GO
ALTER DATABASE [YourDatabase] SET RECOVERY SIMPLE WITH ROLLBACK IMMEDIATE
GO

SQL Server Index – How to Fix Endless Growth

You're confusing a few terms, I think. The ndf file is not an "index file", it is just another database data file to house the data.

It sounds like you have two database data files (the MDF and NDF). With the size of your MDF being so small, my guess would be that your database has two different filegroups (the PRIMARY filegroup using the MDF, and the "other" filegroup using the NDF file).

You need to do a little discovery here.

What are the filegroups in your database?

select 
    name,
    type_desc,
    is_default
from sys.filegroups;

Which database files belong to which filegroups?

select
    file_name = df.name,
    df.physical_name,
    filegroup_name = f.name
from sys.database_files df
inner join sys.filegroups f
on df.data_space_id = f.data_space_id;

And which data space do all of the allocation units live in, and how much are they?

select
    data_space_name = ds.name,
    total_pages = sum(au.total_pages),
    total_pages_mb = 
        convert(decimal(11, 2), sum(au.total_pages) * 8.0 / 1024)
from sys.allocation_units au
inner join sys.data_spaces ds
on au.data_space_id = ds.data_space_id
group by ds.name;

My guess is that you have two filegroups, PRIMARY and another one. The other filegroup probably has the NDF file that you are looking at linked up with it. And maybe even the "other" filegroup is the default filegroup. This is actually a prudent approach, so that you keep user objects off of the PRIMARY filegroup.

Why do you think this is a problem? Maybe you just have 57.3 GB worth of data in this database now. Don't be thrown off by the differing sizes of the MDF and NDF (unless they are in the same filegroup, in which case update your question with the specifics from my diagnostic queries above).

EDIT

Every time I try to reorganise or rebuild indexes on this database, all available RAM is eaten up for this operation, the server becomes almost unresponsive and IO activity is incredibly high

You should only be selectively rebuilding/reorganizing your indexes based off of their fragmentation levels. As for the memory consumption, there are ways to minimize that impact such as limiting by altering the max server memory server configuration so that SQL Server doesn't over consume memory and then battle with the OS.

Best Answer

Related Solutions

Sql-server – Dealing with large log files

SQL Server Index – How to Fix Endless Growth

Related Question