SQL Server 2012 – How to Copy Large Table to Archive Table

sql serversql-server-2012

I have a table with logs of my application that have 161.626.718 rows (and growing) and it's very slow to make a insert on it.
My idea is to keep only 6 months of log in the table and all older records I want to copy on a archive table that is not accessible by the application and if anyone need some info, just do a select direct on the table.
So, the idea is to create a daily job like this:

SELECT * INTO audlog_backup FROM audlog WHERE XDATE <= DATEADD(day,-185,GETDATE())

DELETE FROM audlog WHERE XDATE <= DATEADD(day,-185,GETDATE())

But because of the size of the table, this will take ages and will slow down or even lock the database for some time on the first run, after all most of the rows will be older then 6 months.

Whats the recommendation for doing it safely and faster as possible for the first time?

More information:

SQL Server 2012 Standard

Script of the create table:

CREATE TABLE [dbo].[audlog_backup](
    [PVKEY] [float] NULL,
    [DKEY] [varchar](14) NULL,
    [XDATE] [datetime] NULL,
    [XTIME] [varchar](8) NULL,
    [XFROM] [varchar](50) NULL,
    [XTO] [varchar](50) NULL,
    [DETAILS] [text] NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
SET ANSI_PADDING ON
ALTER TABLE [dbo].[audlog_backup] ADD [XUSER] [varchar](50) NULL
ALTER TABLE [dbo].[audlog_backup] ADD [PEOPLEKEY] [int] NULL
ALTER TABLE [dbo].[audlog_backup] ADD [ADDRESSKEY] [int] NULL
ALTER TABLE [dbo].[audlog_backup] ADD [XFILEKEY] [varchar](50) NULL
ALTER TABLE [dbo].[audlog_backup] ADD [XLOGNO] [int] IDENTITY(1,1) NOT NULL
ALTER TABLE [dbo].[audlog_backup] ADD [XTABLE] [varchar](50) NULL
ALTER TABLE [dbo].[audlog_backup] ADD [XFIELD] [varchar](50) NULL
 CONSTRAINT [PK_AUDLOG_backup] PRIMARY KEY NONCLUSTERED 
(
    [XLOGNO] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

Best Answer

You can use the following Stored Procedure, scheduled via a SQL Server Agent job, to archive records on a continual basis. You don't need to worry about the initial move of "everything older than 6 months" as it will be taken care of naturally by this process since it will be able to archive records faster than you are inserting them. Of course, if you want to get a large chunk of those rows archived immediately, you can run this Stored Procedure manually (while the job is not running), and just pass in parameter values that are slightly higher for @BatchSize and @SecondsToRun.

Important Notes:

  • Scheduled this to run via SQL Server Agent, probably once per hour
  • Set defaults for @BatchSize and @SecondsToRun to match how fast data is coming into the table
  • Create a NonClustered Index on dbo.AudLog ([XDATE] ASC)
  • Assuming that XLOGNO is in the main AudLog table, remove the IDENTITY specification from the XLOGNO column here (in AudLog_Backup) as that value comes from the main AudLog table
  • Change the datatype of the DETAILS column to be VARCHAR(MAX): text has been deprecated since the release of SQL Server 2005 and is more difficult to work with.

Minor notes:

  • Running this Stored Procedure, at least via the SQL Server Agent job if not also manually at first to get a large chuck of rows moved over, should be less work than swapping the table out, and there is no period of time when the AudLog table is empty.
  • The Stored Procedure will simply exit if there is nothing to archive, so it doesn't hurt to run once per hour, or maybe every 2 - 3 hours (depending on how fast new rows come in).
  • The Stored Procedure is throttled to exit after a certain number of seconds so that if it is causing contention on the main AudLog table, it will only be for short periods of time.
  • The OUTPUT clause binds the DELETE from the main AudLog table and the INSERT into the AudLog_Backup table in a Transaction, so you don't need to worry about the process failing, or system shutting down / crashing, between those two operations.
CREATE PROCEDURE dbo.AudLog_Archive
(
  @BatchSize INT = 2000, -- don't go above 3000 to avoid lock escalation
  @SecondsToRun INT = 180, -- run for 3 minutes
  @DaysToKeep INT = 185 -- archive rows older than 6 months
)
AS
SET NOCOUNT ON;

DECLARE @EndTime DATETIME = DATEADD(SECOND, @SecondsToRun, GETDATE()),
        @ArchiveDate DATETIME = DATEADD(DAY, @DaysToKeep * -1, GETDATE()),
        @RowsArchived INT = @BatchSize; -- initialize to be able to enter the loop


WHILE (@EndTime > GETDATE() AND @RowsArchived = @BatchSize)
BEGIN

    IF (EXISTS(
                 SELECT 1
                 FROM   dbo.AudLog al
                 WHERE  al.[XDATE] < @ArchiveDate
              )
        )
    BEGIN
        ;WITH batch AS
        (
            -- Keep this as SELECT * as it will alert you, via job failure, if
            -- you add columns to AudLog but forget to add them to AudLog_Backup
            SELECT  TOP (@BatchSize) al.*
            FROM    dbo.AudLog al
            WHERE   al.[XDATE] < @ArchiveDate
            ORDER BY al.[XDATE] ASC
        )
        DELETE b
        OUTPUT DELETED.* -- keep as * for same reason as noted above
        INTO   dbo.AudLog_Backup ([PVKEY], [DKEY], ...) -- specify all columns
        FROM   batch b

        SET @RowsArchived = @@ROWCOUNT;

        WAITFOR DELAY '00:00:01.000'; -- one second delay for breathing room
    END;
    ELSE
    BEGIN
        BREAK;
    END;
END;