Sql-server – faster way of moving data between databases than CTEs

archivecteperformancesql server

I use the following query on SQL Server 2012 Express to move data that is older than 4 days old out of one [primary] database and into another [secondary database] for archiving purposes. The only issue is that this takes the database offline as it is usually moving about 500k rows of data, and that figure is climbing with more data points being added to the database.

The result is that my web based app is not able to access the database for (in most cases) about 2 hours, and this stalls a lot of other processes, as well as the app.

DECLARE @4daysago datetime
SELECT @4daysago = DATEADD(d, -4, GetDate());

SET IDENTITY_INSERT [activetrackarchivedb].dbo.[Data Import] ON;

--Transfer from current (production) DB to Archive DB
WITH CTE as (
    SELECT TOP 1000000 *
    FROM [activetrackdb].dbo.[Data Import] 
    WHERE [activetrackdb].dbo.[Data Import].[Receive Date] < @7daysago
    ORDER BY [Receive Date] ASC)
DELETE CTE
  OUTPUT DELETED.id, 
  DELETED.[Company id], 
  DELETED.[Site id],
  DELETED.[Site name],
  DELETED.[Receive date],
  DELETED.[Detect date],
  INTO  [activetrackarchivedb].dbo.[Data Import] 
  (id, 
  [Company id], 
  [Site id],
  [Site name],
  [Receive date],
  [Detect date]);

Is there a better method I could use to 'transfer' these rows? Even if the new method was slower, if it could at least still allow access to the database. This one took me a week to implement (I'm a novice) and involved the StackOverflow community's help. It's been great up until now, but as amount of data has increased, its become a very cumbersome process.

This is not for backup purposes either. The database the rows are transferred to simply have a higher capacity and the archived data is still used in the production environment (it just makes the primary database more maintainable and responsive when combined with other maintenance and indexing scripts)

Any help would be greatly appreciated.

Best Answer

This doesn't have anything to do with using a CTE specifically, as your question title implies. It is probably a combination of at least one or more of the following:

  • an expensive scan (or seek + a million lookups) to identify the rows to delete (I assume [Receive Date] is not the clustering key)
  • many underlying non-clustered indexes that also must be updated
  • blocking (both on the delete and on the target/archive table)
  • a batch size too large that causes too much transaction log activity (which can be especially bad if you have poor t-log configuration)

See this blog post but generally you may want to delete in smaller chunks and inject some artificial delays and/or checkpoints / log backups in between each individual delete.