SQL Server 2012 – Archiving High Write Logging Data

archivesql serversql-server-2012

I have a logging database with a few lookup tables and one massive event table that has 79 million rows. On busy days my code inserts over 3 million rows, so the event table would be enormous in production if not for the fact that every half hour I copy the latest data to an archive database on a separate server and then every night delete all rows more than one month old from production.

The problem is that these deletes are now taking over 10 hours to run. I don't think they cause any blocking as long as I put 100 ms delay between each delete statement (script below), but I feel like things could be improved. Should the logging db be in snapshot isolation mode? We typically write the row and then update the EndTime column after the function has completed. Any suggestions on how to improve the process and/or speed up production would be appreciated.


DECLARE @CutoffDate datetime = DATEADD(DAY, -30, GETDATE()), @Time datetime, @Stop bit = 0, @RowCount int

WHILE (@Stop = 0)
BEGIN
    SET @Time = GETDATE()

    DELETE TOP (100) af 
    FROM ApplicationEvent ae
    JOIN ApplicationFault af ON af.ApplicationEventID = ae.ApplicationEventID
    WHERE LogTime < @CutoffDate 

    DELETE TOP (100) ae 
    FROM ApplicationEvent ae
    LEFT JOIN ApplicationFault af ON af.ApplicationEventID = ae.ApplicationEventID 
    WHERE LogTime < @CutoffDate AND af.ApplicationFaultID IS NULL

    SELECT @RowCount = @@ROWCOUNT
    SELECT @Stop = CASE @RowCount WHEN 0 THEN 1 ELSE 0 END 

    --PRINT DATEDIFF(MS, GETDATE(), @Time)

    WAITFOR DELAY '00:00:00:100';   
END

Here's the schema.


CREATE TABLE [dbo].[ApplicationEvent](
[ApplicationEventID] [int] IDENTITY(1,1) NOT NULL,
[LogTime] [datetime] NOT NULL,
[TransactionID] [int] NULL,
[TransactionType] [int] NULL,
[EventID] [int] NOT NULL,
[EventLevelID] [smallint] NOT NULL,
[StartTime] [datetime] NOT NULL,
[EndTime] [datetime] NULL,
[IPAddress] nvarchar NULL,
[UserName] nvarchar NULL,
[ApplicationID] [smallint] NOT NULL,
[EventInformation] nvarchar NULL,
[HostName] nvarchar NULL,
[ApplicationSourceId] [smallint] NULL,
CONSTRAINT [PK_ApplicationEvent_1] PRIMARY KEY CLUSTERED
(
[ApplicationEventID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

Best Answer

Two suggestions come to mind.

One is that the clustered index on both ApplicationEvent and ApplicationFault above be on the LogTime column. Assuming the data is posted to the table in chronological order, you'll have reduced fragementation from page splits and benefit from range scans when purging out older time periods.

The second builds on the first to implement parititioning based on the LogTime column. Instead of the relatively expensive I/O during the delete operations, you can implement a sliding window which would 'slide' older time periods out the main table with ALTER TABLE commands, effectively dropping unneeded dates via essentially meta commands.